ON THE TWO DIMENSIONAL BILINEAR HILBERT TRANSFORM 



CIPRIAN DEMETER AND CHRISTOPH THIELE 



Abstract. We investigate the Bilinear Hilbert Transform in the plane and the point- 
wise convergence of bilinear averages in Ergodic theory, arising from Z 2 actions. Our 
' techniques combine novel one and a half dimensional phase-space analysis with more 

, . standard one dimensional theory. 
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1. Introduction 

In [10], [11], the following bounds were proved for the one dimensional Bilinear Hilbert 
Transform 

Theorem 1.1. Let (5 ^ {0, 1}. The bilinear operator defined by the principal value integral 

r alt 
H(f,g)(x) = J f( x + t )g(x + (3t)j 

satisfies 

\\H(f,g)\\ p , 3 < imuMu 

whenever — + — + — = 1, 1 < pi,p 2 < oo and | < p', < oo. 

pi P2 P3 ' t-i-irt — 3 fi 

In this paper we will investigate two dimensional versions of this result. 
More precisely, let K : R 2 \ {(0,0)} —>■ R be a Calderon-Zygmund kernel, that is a 
kernel satisfying 

\PK(t,l)\S\\{t,v)\\- {a{ , 
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for all a e Z\ with < |a| < iV 4 , and all (£,77) ^ (0,0). Here N is a large enough 
positive integer, whose value will not be specified. 

We also consider the matrices Ai,A 2 G M 2 (R), and the associated two dimensional 
Bilinear Hilbert Transform 

T AlA2 (F 1 ,F 2 )(x,y) := / F 1 ((x, y) + A^t, s))F 2 {(x, y) + A 2 (t, s))K(t, s)dtds. 

We will assume at least one of the Ai is not singular. Due to symmetry, we may and 
will assume that A\ is not singular. We will investigate the mapping properties of Ta x ,a 2 
in terms of the spectrum Spec(B) of B := A 2 A± 1 . 

These questions have parallel interest in Ergodic theory. We investigate the implications 
of our analysis to Ergodic theory in the last section of the paper. 

This material is based upon work supported by the National Science Foundation under 
agreement No. DMS-0635607. In addition, the first author was supported by NSF Grant 
DMS-0556389. The second author was supported by NSF Grant DMS-0701302. Any 
opinions, findings and conclusions or recommendations expressed in this material are 
those of the authors and do not necessarily reflect the views of the National Science 
Foundation. 

2. Classification 

We first note that by the change of variables Ai(t, s) — > (t, s) it suffices to analyze 
operators of the form 

/ F 1 ((x, y) + (t, s))F 2 ((x, y) + B(t, s))K(t, s)dtds. 
Jr? 

Indeed, since A\ is nonsingular, HAf 1 ^, s)\\ ~ \\(t, s)\\, and the kernel K(A^ 1 (t, s)) re- 
mains Calderon-Zygmund. 

By dualizing, it suffices to consider instead the associated trilinear forms, defined by 

A B (F 1 ,F 2 ,F 3 ) := [ F 1 {(x,y) + {t,s))F 2 {(x,y) + B{t,s))F 3 (x,y)K(t,s)dxdydtds, 

and to understand the range of exponents pi for which we have 1 

3 

i=i 

If B is similar to another matrix C, say C = ABA^ 1 , then A# and Ac have the same 
mapping properties. To see this, write 

A B (F 1 ,F 2 ,F 3 ) = [ F 1 A (A(x,y)+A(t, S ))F 2 A (A(x,y)+AB(t, S ))F A (A(x,y))K(t,s)dxdydtds, 

with F A (x ) y) := F(A~ 1 (x, y)). Note that the two functions have similar W norms. By 
changing variables A(x,y) — > (x,y) and A(t,s) — > (t, s) we recover A C -(F{ 4 , F A , F 3 A ), and 
the claim follows. 

1 We restrict attention to the Banach space case 1 < pi < 00 
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The forms A B are associated with multipliers that are singular on the linear subspace 
(called the singularity) of R 6 determined by the system of equations 2 

'6 + 6 + 6 = 

^1+^2+^73 = 

6 + frn6 + & 2i^2 = o 

K Vi + &126 + b 2 2V2 = 

Here (6? ? ?j) are the frequency variables of F{. The profile of the form A B depends on the 
extent to which its singularity is the graph of (6, over (^-, rjj), for i,j e {1, 2, 3}. This 
in turn can fail for one or more pairs giving rise to degeneracies. The appearance 

of a hierarchy of degeneracies is the main new phenomenon in two dimensions that we 
address in this paper. It prompts us to use what we think of one and a half dimensional 
time frequency analysis. The one dimensional Bilinear Hilbert Transform from Theorem 
1.1 has only one type of degeneracy, when f3 — or f3 — 1. In this case the operator is 
reduced to a linear Hilbert Transform, possibly applied to a product of two functions. 

We will distinguish the following cases, in each of which the singularity will be two 
dimensional. 

• Case I. 

{0, 1} n Spec(B) = 0. 

In this case our operator is completely non-degenerate. Its analysis is an adaption 
of the one dimensional theory to the two dimensional context much in the spirit 
of [15]. See Section 5. 

• Case 2. 

Spec(B) = {0}. 

In this case, by the Jordan canonical form theorem, B will be similar with either 
1 " 
' 
In the first case we get 

A B (F 1 ,F 2 ,F 3 ) := [ F 1 (x + t,y + s)(F 2 F 3 )(x,y)K(t,s)dxdydtds. 

JR* 

As in the case of the one dimensional bilinear Hilbert transform, thanks to the 

full and uniform degeneracy, we immediately conclude that is bounded on 
ypi x £P 2 x u>z jf anc [ on iy jf3 J_ _|_ J_ + J_ _ 1 ; 1 < pi < an( j 1 < p 2 ^p 3 < oo. 

This follows from the well known two dimensional singular integral theory. 
In the second case, A B takes the form 

A B (F 1 ,F 2 ,F 3 ) := / F 1 (x + t,y + s)F 2 (x + s,y)F 3 (x,y)K(t,s)dxdydtds. 

We prove its boundedness in Section 4. The singularity can be parametrized as 

{(0, a, —a, b, a, —a — b) : a,b G R} 
and one can easily see that neither (—a, b) nor (a, —a — b) is the graph over (0, a). 

! The first two equations describe the support of the multiplier 
'We will ignore the endpoint L 1 results 
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Case 3. 

Spec(B) = {1}. 

This is the case symmetric to Case 2, we will encounter the same possibilities. 

By the Jordan canonical form theorem, B will be similar with either \ ^ 
J ' 1 

. The change of coordinates (x + t,y + s) — > (x, y) shows that these 

subcases correspond to the two subcases of Case 2. 
• Case 4. 

Spec(B) = {1, A}, A £{0,1}. 

A 
1 

{(a, b, a, —6, —2a, 0) : a, b G R} 

and one can easily see that neither (a, 6) nor (a, —6) is the graph over (— 2a, 0). 
We address this case in detail in Section 3. 
Case 5. 

Spec(B) = {0,A}, A i {0,1}. 

This gives the same possibilities as in Case 4, by the same reason Case 3 and Case 
2 are equivalent. 
Case 6. 

Spec(B) = {0, 1}. 
1 



In this case B is similar to 



The singularity can be parametrized as 



In this case B is similar to 
things, the form is equivalent to 



, and after substituting y + s by y and renaming 



A B (Fi, F 2 , F 3 ) := [ F l {x + t, y)F 2 (x, y + s)F 3 (x, y)K(t, s)dxdydtds. 

The singularity can be parametrized as 

{(0, a, 6, 0, —a, —6) : a, b e R}, 

and it can easily be seen that more degeneracies are present here. The methods 
we develop in this paper do not seem by themselves sufficient to address this very 
interesting and highly degenerate case. We hope that a further refinement of our 
techniques will tackle this problem. 

Due to the degeneracies present in the operators we investigate, the traditional two 
dimensional 4 decompositions are ineffective, in that the associated model sums fail to be 
bounded. 

The main novelty of our approach in this paper lies in the use of one and a half 
dimensional 5 phase-plane projections. We exemplify this approach in Section 3 for B = 



4 Here both phase and space are thought of as each representing one dimension 

5 The ambient space for the phase dimension is R 2 ; decompositions, projections and various structures 
like tiles, trees etc., will be referred to as one and a half dimensional if they live in R 2 x R 
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and then briefly explain in Section 4 how our techniques also address the case 



B = 



A 
1 

1 


Finally, we point out the fact that the operators we investigate contain classical one di- 
mensional operators with modulation invariance. We emphasize two important instances. 

First, perhaps less surprisingly, the boundedness of the operator analyzed in Section 
3.2 implies the boundedness of the one dimensional Bilinear Hilbert Transform, in some 
range. To see this, transfer first the result from R 2 to the square torus T 2 . Then use 
Fi(x, y) = f(x) and F 2 (x, y) = g(x), $ = X[-i,i]> while \I/ is an appropriate function which 
decomposes the kernel \jt. 

Second, and quite strikingly, the boundedness of the operator analyzed in Section 4 
implies the boundedness of the Carleson operator, in some range. In short Carleson's 
operator is defined by 

ds , 



C(f)(y):= sup | / f(y + s)e iNs - 
NeR J 



It suffices now to chose Fi(x,y) = f(y), F 2 (x,y) = e txN ^g(y) and F 3 (x, y) = e~ txN ^h(y) 
with ||g|| P2 = \\h\\ P3 — 1, ^ — X[-i,i]> an d $ an appropriate function which decomposes 
the kernel 1/t, and to localize the estimates in Section 4. 

While this may appear as yet another proof of Carleson's classical theorem [4], the 
argument of this paper in the special case above reduces largely to the proof of Carleson's 
theorem in [12]. But the approach in the current paper is further evidence for a unified 
proof for bounds of the bilinear Hilbert transform and Carleson's operator, following up 
on the analogy that was stressed in [12]. 

We refer to the last section for an ergodic theoretic perspective. 



3. The Case 4 and 5 

" A 



We will analyze the trilinear form associated with B 



1 



, where A £ {0, 1}. All 



values of A ^ {0, 1} are entirely typical, however, to minimize the number of parameters 
and to ease the exposition we will assume A = — 1. We thus look at 

A(Fi, F 2 , F 3 ) = / F 1 (x + t,y + s)F 2 (x -t,y + s)F 3 (x, y)K(t, s)dxdydtds. 
More precisely, we prove 



Theorem 3.1. For each 2 < pi < oo with — + — + — — 1, and each Fi e L Pi (R 2 ) we 

fl Pl P2 P3 ' V ' 

have 



|A(iJi,F 2 ,F 3 )|<IJ||F i || w . 



We remark that we find likely that a more refined analysis 6 can push the range of 
validity of Theorem 3.1 to all 1 < Pi < oo satisfying ^- + ^ + ^ = 1. We will not pursue 
this here. 



6 In particular, one would have to eliminate some appropriate exceptional sets 
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A simple but important observation shows that 

A(Fi, F 2 , F 3 ) = [ F^x + t, y)F 2 {x - t, y)F 3 (x, y - s)K(t, s)dxdydtds. 

This formulation of A(Fi, F 2) F 3 ) anticipates one of the main features of our approach: we 
will not do any frequency decomposition for F\ or F 2 in the second variable. Indeed, it is 
not too hard to see that a full two dimensional approach as in the non-degenerate case (see 
Section 5), that would amount to a full two dimensional decomposition of all F^ followed 
by inserting absolute values on pieces of the operator associated with each multi-tile, will 
make the model operator unbounded on all L p spaces. We omit these details, but we 
mention that the "enemy" here is the fact that the form contains a pointwise product on 
the y variable of F\ and F 2 . This pointwise product will not be decomposed any further, 
but will rather be ignored until the later part of the argument. 

Definition 3.2. Let m : H d — > R and let D C R d be a d— dimensional cube with 
sidelength L. We will say that m is adapted to D of order M if m is supported in D and 

\\d a m\\ Lo , < L"H (1) 

for each a € N d with \a\ < M. 

We will often refer to various m as adapted to a certain interval in a more general sense, 
that is with the understanding that there is an extra (implicit) constant on the right hand 
side of (1). This implicit constant will not be stated, but it will always be bounded by a 
universal constant (i.e. 0(1)). 

In order to discretize A(Fi, F 2 , F 3 ), we first perform a "cone decomposition" of K, that 
is we decompose smoothly K into pieces localized in (finitely many) cones 7 centered at the 
origin (see for example [13]). This decomposition reduces Theorem 3.1 to getting bounds 
for 

V/ F 1 (x + t,y)F 2 (x-t,y)F 3 (x,y-s)V k (t)<S> k (s)dxdydtds, (2) 

where tf fc (t) = ^{t/2 k ) and $ fe = ^$(s/2 fc ), and * and $ are functions whose Fourier 
transforms are adapted to [—1/2, 1/2] of some large order. Moreover, we can assume at 
least one of \& and $ is supported away from and thus * or $ has mean zero. This 
latter condition reflects the fact that the cone to which K is restricted can not intersect 
both punctured (frequency) axes. To pass from any smooth cone decomposition to cones 
multipliers that are sume of tensor products as in (2) one can use the standard method 
of Fourier expansion of pieces of the cone multiplier. 

The bulk of the paper (Section 3.1) is devoted to the analysis of the case where K is 
restricted to a cone that does not intersect the punctured 77 axis 8 . The analysis in the 
case when the cone does not intersect the punctured £ axis is somewhat easier 9 (at least 
for exposition purposes), and will be presented in Section 3.2. 

7 These cones will typically have the same aperture, much smaller than 7r/2 
8 ?7 is the dual of the y variable 

9 We will also make the point that similar techniques to the ones we develop to address the first type 
of cone also apply to the second type of cone 
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3.1. The cone / $ = 0. 

We will thus focus on the case J $ = 0. 

3.1.1. Discretization. By using standard reductions, in order to get bounds for (2) it 
suffices to prove the boundedness of the model sum 

3 

E I[*!$Fi(x,y)dxdy (3) 

Q=U>\ XUJ2 1=1 

where for % G {1,2}, 1$ denotes some projection operator (acting on the x variable) 
associated with a multiplier 10 adapted to u of order iV 4 , while is the tensor 
product of a projection as above in the first coordinate and a projection as above on 
2\cu\], in the second coordinate. 
Here Q is a collection of frequency cubes Q = uj\ x uj 2 x u 3 satisfying the following 
properties: 

Definition 3.3. 

• Q is a one parameter family, in that each component u>i determines uniquely the 
other two components of a given Q. 

• Each uji is an interval in a fixed shifted dyadic grid 11 V 

• For each Q, = 2 Jj for some j G Z, where J G N is a fixed large enough natural 
number. Such intervals will be referred to as J- dyadic. 

• \u>i\ = |c<4| and uji ^ uj[ implies dist(cjj, c^') > 2 J \cu i \ 

• UJi = u 2 . 

• There is a (possibly different) shifted dyadic gridT>\ such that for eachui, i G {1, 2} 
there exists 12 oJi G T>\ such that 3000cjj C^C 4000cjj. 

• — 2£ G Cquj^ whenever^ G uj\, where Cq is some large enough universal constant 121 . 

The properties above are easily achieved by stretching the intervals ooi as needed, by 
an 0(1) factor, and by embedding them into intervals (of similar size) of a shifted dyadic 
grid. The procedure is completely standard, we refer the reader to [6] (see for example 
section 6) and [5] for details. The sparsification induced by the constant J implies that we 
have to deal with roughly O(J) model sums like that in (3). This is however no problem, 
since J = 0(1). 

We anticipate a bit the proof of the boundedness of (3), and mention that the only 
source of orthogonality will be the fact that 7r^ projects in the second coordinate on 
intervals of the form [\uj 3 \, 2|a> 3 |], which are pairwise disjoint for distinct scales of u 3 . The 

10 That is n^F(x,y) = f R2 m u {t)F(£, r])e i( - x ^ + y^ d^dr] 
n That is the collection of intervals of the form 

: i= j (mod M- 1), I e zj, 

with M > 3 an odd integer, < j < M - 2 and < L < M - 1 

12 The enlarged intervals u>i are a technicality needed for the construction of phase space projections 
for overlapping trees. They are only needed for i e {1,2} 

13 This is achievable since Q is "close" to the plane Ci + C2 + ?3 = 0. Sec for example [5] for details. 
The precise positioning of each w 3 with respect to ivi is unimportant for our considerations, since it will 
not affect any type of orthogonality in our argument 




JM,j,L 



1 + 



L 

M 



L 

M 
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requirement U\ = uj 2 will not generate orthogonality, and in general, we can not do better 
than that, that is, we can not achieve a separation condition 14 like oj\ = C\u 2 \ + oj 2 - This 
can be easily seen in the case when \1/ does not have mean zero 15 (worst case scenario). 

We will further discretize (3) this time on the spatial side, and for this we introduce 
some notation. 

Let 7] denote a fixed positive function with integral 1 and with Fourier transform sup- 
ported in [— 2~ 2J ,2 2J ], satisfying the pointwise estimates 

c-\\ + \x\)~ N2 < V (x) < C{1 + \x\)~ N \ (4) 

for some large enough C that may depend on N. 

Let r/j denote the function rjj(x) := 2 jJ r/(2 jJ x). For any subset E of R or R 2 , denote by 
Xe the characteristic function of E. If E C R we define the smoothed out characteristic 
function xej by 

Xej -=XE*Vj- 

For a square R = I x J we will also use the notation 

XR,j(x,y) = Xi,j(x)x.j(y) 
Note that we smoothen out only in the first coordinate. Note also that 

XU a £AEa,j ^XE a ,y (5) 

whenever E a are disjoint. 

Note that xej is a frequency-localized approximation to xe- In fact we have the 
pointwise estimate 

\XeAx)-Xe(x)\ < C(l + 2i J dist(x,dE))- N2+ \ (6) 
where dE is the topological boundary of E. 

Definition 3.4. A multi-tile P = Rp x Qp is identified by its spatial component, a 
J- dyadic square Rp = Ip x Jp from the standard dyadic grid 16 , and by its frequency 
component, the J- dyadic cube Qp = oop 1 x wp 2 x u>p 3 G Q ; satisfying the property that 
l-fpll^Pil = 1- For each such P , we denote by jp the integer such that \Ip\ = 2~i pJ . We 
will actually abuse notation and for each J dyadic square R will denote by jp the integer 
such that R has sidelength 2~i J . The collection of all multi-tiles is denoted with P. 

We will sometimes abuse notation and denote u Pl = ojp 2 by ojp, while the enlarged 
intervals ujp 1 = u)p 2 from Definition 3.3 by ujp. 

If P is a multi-tile, its restrictions Rp x up and Rp x ujp will sometimes be referred to 
as tiles. 



This kind of separation condition is achievable in the case of the one dimensional Bilinear Hilbert 
Transform, and is the main source of orthogonality in that instance 

15 However, if both \P and $ have mean zero, that is, if the cone does not touch either punctured 
frequency axis, then this extra separation can be achieved, and the argument gets significantly simpler 

16 Spatial intervals which are referred to as dyadic are always understood to be in the standard dyadic 
grid 



ON THE TWO DIMENSIONAL BILINEAR HILBERT TRANSFORM 



9 



From (5) we have that 

Yl [ [Yl^SF i (x,y)dxdy=J2 \ \ XR P jp(x,y)Y[ir®Fi(x,y)dxdy. 
QeQ J i=i PeP J J i=i 

By incorporating all the reductions made in this section, and by invoking a limiting 
argument, Theorem 3.1 will follow if we prove the following 

Theorem 3.5. Let P be an arbitrary finite collection of multi-tiles 17 . Then for each 
2 < pi < oo we have 

3 3 

XR P ,j P (x,y)Y[^Fi(x,y)dxdy\ < JJ H^H^. (7) 




E 

PeP' " i=i 



Moreover, the implicit constant only depends on pi 

The proof of this Theorem is postponed until Section 3.1.5. There are two important 
ingredients that lie behind this proof: a localized estimate (Proposition 3.16) and a Bessel 
type inequality (Proposition 3.19). 

3.1.2. Tree selection and sizes. In this section we organize P into structured collections 
called trees. 

Definition 3.6. Let (eR and let R be a J- dyadic square. We define 
and 

uo i)R ■= [f - 500 x 2 jRj , £ + 500 x 2 jnJ }. 

A tree (T,£ T ,i? T ) is a nonempty collection T C P of multi-tiles such that for each 
P G T we have Rp C R T and ou£ t ,r t Q Wp. The pair (£ T , R T ) will be referred to as the 
top data of the tree. We will write u T and lj t for oo^ Tt R T and u^ t Rt . 

The tree will be referred to as lacunary if £t ^ 2up for each P e T and non — lacunar y 
(sometimes also referred to as overlapping) if £t £ 2cc>p for each P e T. 

Remark 3.7. Each multi-tile P gives rise both to an overlapping tree ({P}, c(ujp) 18 , Rp) 
but also to a lacunary tree ({P}, £, Rp), where ^ can be any point in lOOup \ 2up. 

Remark 3.8. If the tree (T, £ T , R T ) is overlapping, we will actually have better localization 
in terms of the top frequency 

2o)t C oJp 

for each P e T. This is a consequence of Definition 3.3. 

Remark 3.9. Note that each tree T can be decomposed as the union of one lacunary tree 
T; and one non-lacunary tree T G each of which has the same top data as the original 
tree. The distinction whether a given tree is lacunary or not will only be made with 
respect to the first two components (recall that Up = Up 1 = 0Jp 2 ). With respect to 
the third component, a tree will always have good orthogonality behavior, and can be 
automatically thought of as 3-lacunary, since 



17 We will abuse notation here and use the same letter P for subcollections 
18 Here and in the following, c(oj) will denote the center of the interval u> 



10 



CIPRIAN DEMETER AND CHRISTOPH THIELE 



2£ T x G C (u P[i x [\up 3 \,2\u;p 3 \]) \2(u Ps x [\u P A, 2\u P3 \\). 

Let J T : = {jp '■ -P G T}. We will also denote by j T : = jn T - 
For each j G J T we denote by 

Ej,T ■= [J Rp- 

P£T:j P =j 

We remark that due to Definition 3.3, for each j G Jt there is exactly one Q with 
sidelength 2 jJ such that Q is the frequency component of a multi-tile P G T. 

For each such j G Jt we define the spatial cutoffs Xji Xj an d the Fourier cutoff Kj as 

Xj ■= Xe ]}T ,j = Yl XR p>i ( 8 ) 

PeT:j P =j 

Xj : = Yl Xi P ,j( x ) x XJpAv) (9) 

PeT:j P =j 

and 

where Ui is the i th component of the unique f2 G T with \ooi\ = 2 jJ . 

We remark that our notation is sloppy here, the operator ttj also depends on the pa- 
rameter i. We will always write nj in combination with a function Fj, and the omitted 
index is always the one of the function Fi. 

Definition 3.10. A tree selection process consists of choosing a tree Ti from P, then 
choosing a tree T 2 from P \Ti and so on. I.e., at the k-th step we choose a tree Tk from 
P \ (Ti U • • ■ U Tfc_i). We shall refer to the trees Tk as the selected trees. 

Definition 3.11. Consider a subset P o/P and some top data (£, R). Then the maximal 
tree T* in P with top data (£, R) is the set of all ?6P such that ui^r C oj p and R P C R. 

A tree selection process is called greedy, if at the k-th step the tree T k is maximal in 
P\(TiU---UT fc _i). 

The fact that trees are selected by a greedy selection algorithm will imply regularity, as 
expressed by Lemma 3.29. This in turn will be used repeatedly in the estimates for the 
phase-space projections in Proposition 3.37, in particular they will ensure that various 
contributions coming from different scales are summable. 

We will use the notation 

*-c(/)] w 



Xi(x) = (1 + 
and 



1/1 



xn(x,y) = xi R (x)xj R (y)- 

Definition 3.12. Let Fi be an L 2 function and let (T, £t, Rt) be a tree. 

We first address the case i G {1, 2}. For each P G T we introduce the following notation 

\\Fi\\ P := sup \\xR P (x,y)T mp (Fi(-,y))(x)\\ L 2 y , 

mp 

where m P ranges over all functions adapted to uj p of order N 2 . 
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If the tree is lacunary then we define its i-size sizej(T) by 



size,(T) := 




If the tree is overlapping then we define its i-size sizej(T) by 

I 

size^T) := sup \\Xr t (x, y)T mT (Fi(-, y))(x)\\ L 2 

where m T ranges over all functions adapted to 10cj t of order N 2 which also vanish at 
some point t> T G 10c<j t . 

For each P G T we also introduce the following notation 

\\F 3 \\p,3 ■= sup\\xR p (x,y)T mp (F 3 (x,y))\\ L 2 y , 

mp 

where mp ranges over all functions adapted to up 3 x [|u;p 3 |, 2|a;j=> 3 1] of order N 2 . 
We now define the 3-size size 3 (T) by 



size 3 (T) : = 




It turns out that controlling the model sum associated with one tree requires a slightly 
stronger notion of size. 

Definition 3.13. Let P C P be a finite collection of multi-tiles. 
If i G {1,2} then we define its maximal overlapping i-size by 

size°(P ) := sup size^T), 

(T,« x ,fl x ) 
TCP,, 

where (T,£t,-Rt) runs over all overlapping trees with T C P . Similarly define the 
maximal lacunary i-size size'(Po) by restricting the supremum to lacunary trees. The 
maximal i-size size*(Po) ofP is taken to be the largest o/size°(Po) and size'(Po). 
Finally, define the maximal 3-size ofP by 

size^Po) := sup sizes(T), 

(T,e T ,i? T ) 

TCP 

where (T,^x, Rt) runs over all trees (lacunary or overlapping) with T C P . 

Remark 3.14. The size depends on the input function Fi, however, to simplify notation 
we will ignore this dependence. It will always be clear from the context what function is 
associated with a given size. 

Remark 3.15. Note that the overlapping size controls phase-space projections onto 3 di- 
mensional boxes, which might in principle be much thinner than a tile. This component 
of the maximal size is merely a technicality needed to control the norm of the phase-space 
projection onto an overlapping tree. It will come into the picture through estimate (39). 

The way we will prove Theorem 3.5 is by first proving the following local estimate. 
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Proposition 3.16. Let (T,£ T ,i? T ) be a tree selected by a greedy algorithm. Let Fj be test 
functions on R 2 satisfying 

\\Fi\\ L °° < 1 (10) 
and let < 9±, 9 2 < 1 and < 9 3 < 1. Tnen we nave 

„ 3 3 

i e / 2 x^n<^i i^Tin^m^ en) 

PeT ^ r2 i=l i=l 

We postpone the proof of Proposition 3.16 to Section 3.1.4. 

3.1.3. The paraproduct estimate. In this section we prove the following global version of 
Proposition 3.16. 

Proposition 3.17. Let £ e R and Zei Q' C Q be a finite collection of frequency cubes 
Q = u i x uj 2 x u 3 := uj x uj x cj 3 tu^/i ine property that (Gw /or eac/i Q G Q'. Then 

QeQ' 7R i=l i=l 

/or eacn 1 < p« < oo satisfying -j- + ^ + = 1. Moreover, the implicit constant in (12) 
zs independent of Q' and Fj e L Pi (R 2 ) ; and z£ on/y depends on pi. 

Proof Note that since (Gw the cubes have distinct scales. For i e {1, 2} denote by m UJi 
the multiplier associated with the one dimensional projections 7i~£* . We split Q' in two 
collections. The first collection Q' : will consist of those Q for which (at least one of) m a , i 
vanishes at £. The proof of (12) immediately follows in this case by estimating 

i E 

QeQi »=i 

by the product of square functions on the i* h and third component and a maximal function 
on the remaining component. 

The second collection Q' 2 will consist of those Q for which none of m Ui vanishes at £. 
It follows that £ £ d for each Q e Q 2 - Let ^ be a function which equals 1 on the interval 
centered at £ with length 1000C and vanishes outside the double of that interval. We 
also denote by 

Mv) = ^(«/2 fc ). 

By writing m Wj = m^.^)^^! + (m Ui — m UJi (O^Vl) an d by the discussion in the previous 
case, it follows that it suffices to prove (12) with m^^j^i replacing m LOi . Since \\m Ui < 
1, it further follows that it suffices to prove the following more general estimate 



L 



3 



Y j ak{T k F 1 ){T k F 2 )T: k F 3 \<J[\\F i \\ pv (13) 



R2 fcez i=i 



where T k is the one dimensional projection associated with iftkj, is a two dimen- 
sional projection associated with a multiplier adapted to [2£ — 2 kJ+w C , 2£ + 2 kJ+10 C ] x 
[2 fcJ , 2 fcJ+1 ] and a fc is a sequence bounded in absolute value by I. Moreover, the implicit 
constant in (13) will only depend on pj. 
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To prove (13), write 

T k F = F — ^ Sk'F, 

k'>k 

where Sy := Ty — Ty-i- It remains to control four terms, namely 

I / E^E^'^ 7 ^ (14) 
^ R2 fcez fc'>fc 

i / Y. akF ^H Sk ' F *> kF *\ ( 15 ) 

^ R2 fcez fc'>fc 



L 



I / y2a k F 1 F 2 n k F 3 \ (16) 
2 fcez 

and 

^^(^^^(^^^TTfcFal. (17) 

fcez fc'>fc fc'>fc 
Let us take a look first at the term in (17). Due to frequency support it equals 

I E / E(^ F i)(^ F 2)(Ea fe vr fc F 3 )|. 
h,he{-i,o,i} Jr2 fc'ez fc<fc' 

Each of the nine terms corresponding to various values of l\,l 2 is easily bounded by the 
product of square functions on the first two functions and the maximal truncation of a 
two dimensional singular integral on the third function. The estimate then follows from 
the well known boundedness of these two operators. 

The terms (14) is estimated by the same argument, upon noting that for k' > k 

\[ (Sk'F 1 )F 2 Tc k F 3 \ = | / (S k/ F 1 )(S k , +1 F 2 + S k ,F 2 + S k ,- 1 F 2 )n k F 3 \. 
Jr. 2 Jr 2 

A similar argument works for (15). 

The proof of (16) is immediate from the boundedness of the two dimensional singular 
integral operator 

fcez 



3.1.4. Proof of Proposition 3.16. The proof of Proposition 3.16 relies on Proposition 3.17 
and on the considerations in Section 3.1.7, mostly on Proposition 3.37. 

By using standard manipulations like in Section 7 from [14], based on triangle's in- 
equality, (6), Lemma 3.29, Lemma 3.35 and Holder's inequality, one can easily reduce 
Proposition 3.16 to proving 



£ L II ■ 

jeJ T K »=i 



KjFi)xjTT 3 F 3 



\R T \Hs\ze*(TY 



(18) 



i=i 
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The novelty of (18) is that it has the spacial cutoffs attached to each function. Note 
that we use both Xj an d Xj- There is no reason to smoothen out a spacial component 
that is not correlated with a frequency localization. The smoothing is used to preserve 
frequency localization. 

By scale invariance it suffices to assume that \Rt\ = 1, while by modulation symmetry 
we can also assume that the tree sits near the origin, that is £ T = 0. We may further 
assume that the tree is either lacunary or overlapping, see Remark 3.9. These reductions 
place us in the setting of Section 3.1.7 so we have all the results in that section at our 
disposal. 

The proof of (18) will follow precisely the same lines as the proof of Proposition 7.1 
in [14]. We briefly describe the strategy. One first uses the estimates from Proposition 
3.37 on how well phase-space projections approximate functions on a tree, (more precisely, 
(44), (45) and (46), depending on whether the tree is lacunary or overlapping), to estimate 

n 2 /i 2 3 

jtGJt i=l jGJ T 2 i=l i=l 

where IT^F;) denotes the phase-space projection of Fj on the tree T. Then one uses (43) 
and (47) to further bound 

2 



E / (nx^ J n,(F i ))x>3n 3 (F 3 ) 

id. JR 2 „-_l 



jeJ T 



by 



We omit the details. 



\Rt\U 



size^ 
i 



3.1.5. Deducing Theorem 3.5 from Proposition 3.16. In this section we state a Bessel type 
inequality that will allow us to deduce Theorem 3.5 from Proposition 3.16 

The idea is to break P into collections of trees T, such that one has control on both 
the maximal 2-sizes size*(T) and on the L 1 norm of the counting function ^ T \Rt\- 

The selection of the trees is done by a greedy selection process, which will be defined 
in various steps. We need the following definition: 

Definition 3.18. Call a tree convex, if it is a selected tree in a greedy selection process. 
Call a subset P CP convex, if it is of the form P \ (T x U • • • U T k ) where T 1 , . . . ,T k are 

the selected trees of a greedy selection process. 

Proposition 3.19. Let 1 < i < 3, A > 0, and suppose that P is a convex collection of 
multi-tiles such that 

size*(P ) < 2A. (19) 

Then there exists a collection JF* of pairwise disjoint convex trees in P such that for each 
e > we have 



5> 



Tl <« A- 2 -<||F ( ||^ (20) 
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jfi G {1, 2} and 

£ I^tI <A- 2 ||F 3 ||2 (21) 

if i = 3, and the remainder set P' := P \ |J TeJ1 r» T ^ s convex and satisfies 

size*(P') < A. (22) 

We postpone the proof of this key proposition to the next section. 

We continue by noting that by using multilinear interpolation (see [8]) it suffices to 
prove Theorem 3.5 under the assumption that Fj = xe { are characteristic functions of 
sets of finite measure. 

Starting with m large and working downward, applying Proposition 3.19 for each 1 < 
i < 3 for each m, we obtain 

Corollary 3.20. Let e > be fixed. For every integer m there exists a collection T m of 
pairwise disjoint convex trees in P such that we have the size estimate 

size*( |J T) < (\E t \2 m )^ (23) 

for all 1 < i < 2 and m G Z 7 

size*( |J T) < (1^2™)^, (24) 
for all m G Z, the counting function estimate 

for all m G Z, and i/ie partitioning 

p = p 2 u |J [J T. (26) 

where P2 «s a subset ofP with size*(P2) = for all 1 < i < 3. 

We also need the following estimate on the maximal size by the Hardy-Littlewood 
maximal function. 

Lemma 3.21. For all 1 < % < 3 and a// Fj G L°°(R 2 ) we nave 

size*(P) < HFIU. 

Proof It suffices to bound by C^Hoo the quantities sizej(T) for i G {1, 2} and T either 
overlapping or lacunary, and sizes(T) for general T. The estimate for i = 3 is entirely 
classical (see for example the proof of its one dimensional analog, Proposition 6.3 in 
[14]). The estimates for i G {1,2} follow by applying a similar argument on fibers above 
each y. Let's take for example a lacunary tree T (this is the harder case). Denote by 
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Z:={I P : P E T}. Note that 

/ V /2 



size,-(T) < 



1/2 

/ I / I . . . \ 

< 



(iff / ^t^^E^pII^ ^)^^^-^))^)!!^^ 

^1 j^xS^ll^^^lli-V 



1/2 

< 

< IIF-II 

where the penultimate estimate follows from the afore mentioned one dimensional result 
applied to each function F(-, y). ■ 

We have now all pieces ready to prove Theorem 3.5. Choose some e < min{pi— 2, p2—2}. 
In the i = 3 case we apply Lemma 3.21, (24), and the fact that F 3 is a characteristic 
function to obtain 

size*( |J T)<min(2 m / 2 |£ 3 | 1/2 ,1) (27) 
Applying (26) we may estimate the left-hand side of (7) by 



e e ie / / x^n^s**^ 



mez Tef m PeT i=i 

(Observe that the set P 2 gives no contribution, e.g. by an appropriate application of 
Proposition 3.16.) 

We may apply Proposition 3.16 with Oi :— — for 1 < % < 2 and 9 3 — 1 and estimate 
the previous expression by 

2 

<ee i^Kn siz <( T )^) size 3(T). 

meZ TeP m i=l 

By (23), (21), (27) and then (25), we may estimate this by 

2 

< 2- ro (J](2 m |^i|) 1/pi ) min(2 m / 2 |E 3 r /2 , 1). 

meZ i=l 

By using the fact that l/p 1 + l/p 2 + I/P3 = 1 this simplifies to 

2 

(II \ E i\ 1/Pi ) Yl min(2 m/2 ~ m/p3 |E3| 1/2 ,2- m/p3 ). 

j=l meZ 

Performing the m summation we obtain the desired estimate 

3 3 



E fxR,j r Tl*®F l \<Y[\E l \V». 

PeP i=i i=i 
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and conclude Theorem 3.5. 

3.1.6. The proof of Proposition 3.19. We start this section by recalling a few results from 
[5]. So far, we have worked with one and a half dimensional trees 19 . We will continue to 
reserve the name "tree" for this particular structure. In addition to this, in the following 
discussion we will also invoke some results about one dimensional trees. 

Definition 3.22. A one dimensional tile P = Ip x up is a J- dyadic rectangle with unit 
area. A one dimensional tree (T, £t,-^t) with top data (£t ? -?t) is a collection of tiles with 
the property that u) T C uj P and Ip C I T for each P G T. 

We call (T,£ T ,/ T ) lacunary if for each P G T we have £ T ^ 2u P . 

We abuse notation here and use the same notation for one dimensional tiles and multi- 
tiles, for one dimensional trees and trees, because in our applications one dimensional tiles 
will arise from multi-tiles while the one dimensional trees will be generated by reliable 
trees. A reliable tree is one with the property that for distinct P, P' G T we have that 
Ip 7^ Ip> (or equivalently, Ip x up ^ I pi x up/). Thus, if T is a reliable tree, then 

{Ipxcop: P G T}, 

is a one dimensional tree. We will refer to it as the one dimensional tree induced by the 
tree (T,^ T ,_R T ), and we will assign it the top data (£t,-?t)- 

More generally, a collection of multi-tiles P' will be called reliable, if for distinct P, P' G 
P' we have that Ip x uj p ^ I P i x uj P i. 

Definition 3.23. We say that two lacunary trees (T,£ T ,i? T ), (T', £t', Rt>) (ire strongly 
disjoint if T n T' = 0, and whenever P G T, P' G T' are such that uj p C ojpi , then one 
has R T fl Rpi = 0, and similarly with T and T' reversed. We define a forest to be any 
collection T of lacunary trees such that any two distinct trees T, T' in T are strongly 
disjoint. 

A similar definition holds for one dimensional trees. Note that the one dimensional tree 
induced by a reliable lacunary tree is itself lacunary. 
For each collection T of trees we will denote by 

||-7~1|bmo := SU P 7757 |-Rt I , 

R -ft ^ 

where the supremum is taken over all the dyadic squares R. Define also the counting 
function 

N r ■= X r t- 

The following result from [5] shows that in order to achieve L l control over the counting 
function of a collection of trees (and this is essentially what we need to prove in Proposition 
3.19, see below), we are permitted to lose a logarithmic factor of H-A^Hoo, as long as the 
argument also works for all subcollections, and localizes to a BMO version as well. 

19 Trees consist of tiles, identified by a spacial component (a square) and a frequency component 
(an interval). If we think about both space (in our case R 2 ) and frequency (in our case R 2 ) as each 
representing a dimension, a tile becomes a one and a half dimensional object. We adopt the same 
terminology for a tree 
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Lemma 3.24. Let T be a collection 20 of trees such that 

||iYH|i < Alog 2 (2 + II^Hloc) and H-F'Hbmo < 51og 2 (2+ ||A^|U~) 
for all subcollections of trees T' C T . Then for each e > we have 

IWrWiSe AB e 

where the implicit constant does not depend on T , but only on e. 

The next result that we recall is a variant of Proposition 13.1. from [5] (see also the 
remark following it). It asserts that the operators x}°T mp , where P ranges through the 
tiles in a one dimensional forest, are almost orthogonal, with a logarithmic loss in the L°° 
norm of the counting function. 

Proposition 3.25. Let T be a forest of one dimensional trees (T,£t, It)- Let P' := 

-ipxujpeT P be the collection of all the one dimensional tiles in the forest. For 
each P G P' let m P be a multiplier adapted to uj p of order 2. Then 



E llx}°T m p(/)ll^<log 2 (2+||Ex/ x ||^ 
PeP' Tejc 

for each f G L 2 (R). 

A standard localization argument also gives the following localized variant of Proposi- 
tion 3.25: 

Proposition 3.26. Under the same hypothesis as above, we have for each dyadic interval 

E II^T m p(/)ll^<log 2 (2+||Ex/ x IU-)ll/x? lli 2 . 
PeP'-.ipCio TeJ 7 

for each f G L 2 (R). 

The above results have been proved in [5] with the phase-space projections x}°T mp (/) 
replaced by their variant (/, 4>s) ( t > s- The proof of both Propositions 3.25 and 3.26 runs 
with no serious modifications. We leave the details to the reader. 

We will also need the following consequence of Lemma 10.4 in [5] 

Lemma 3.27. Let 71 be a collection of dyadic squares and n > 0. Then we can split 1Z 
as 71^ U 7Z b such that 

ii E *>kIi°° ~ 24n n E ^11°° 

Ren* Ren 

and 



£i«i<i£i«|. 

Re-ftb Ren 



20 This lemma is stated in [5] with the extra assumption that J 7 is a forest; however, its proof in [5] 
shows that T can be an arbitrary collection, actually for all practical purposes, T can be thought of 
as merely a collection of dyadic squares -Rt- We choose this minimal formulation, but remark that in 
our application of this lemma, T will actually be a forest. Along the same lines, we also observe that 
while the result in [5] is stated for one dimensional trees, the extension to our one and a half dimensional 
setting requires no modifications 
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We have now all the tools ready to prove Proposition 3.19. 
We first consider the case % G {1,2}. Fix such an %. 

To reduce the maximal size size*(Po) to at most A, we need to eliminate all overlapping 
and lacunary trees T with sizej(T) exceeding A. We will do this in a minimal manner, so 
that we achieve the desired control over the counting function of the tree tops. 

We first take care of the lacunary size. For each n > and each square R define 

XR,n '■= XRX2 n + 1 R\2"R- 

Call a lacunary tree (T,£ T ,i? T ) upper lacunary if £ T > c(uo P ) for each P e T and lower 
lacunary if £ T < c(uo P ) for each P e T. 

To guarantee that after the elimination process stops the lacunary size is no greater 
that A, it suffices to make sure that there are no upper or lower lacunary trees (T, £ T , R T ) 
left in the collection, no n > and no m P which is adapted to cup of order N 2 such that 

|4t E \\^lA^y) T rnAFi(-,y))(x)\\l ly > 2— 10 A 2 . (28) 

I T| peT 

To achieve this, we eliminate lacunary trees according to the following algorithm: 

• Step 0: Set n = 0, P* := P , T bad ,n = 0- 

• Step 1: Select a "bad" upper lacunary tree (T,£ T ,i? T ) £ P*, that is a tree 
satisfying (28). We also make sure that £t is minimal over all trees with this 
property 21 . Put this tree in the collection J r bad,n- If no such tree is available, go to 
Step 4. 

• Step 2: Construct the collection C((T, £ T , R T ),n) to consist of the following convex 
trees: for each J- dyadic square R C 2™ +3 i? T with sidelength equal to that of i?x 
(and there are 2 2n+6 of them) we let T# be the maximal tree with top data (£t> R)] 
the collection C((T, £ T , R T ),n) will consist of all these (at most 2 2n+6 trees). Elim- 
inate all these convex trees from P*, that is, reset P* := P* \ Ut"gc((t£ t r t ) n ) ^' 

• Step 3: Go to Step 1 

• Step 4: Reset n :— n + 1, Tbad.n = 0- and go to Step 1. 

While the algorithm runs forever, it will produce no bad trees for large enough n, since Po 
is finite. After we are done with eliminating the upper trees, if P* is nonempty we repeat 
the algorithm for lower trees (the only difference is that in Step 1, £ T will be maximal). 
It suffices to prove that for each n > and each e > 

\Rr\< e 2- Sn \- 2 - e \\F i \\l (29) 

We will prove this by using the results about one dimensional trees from the beginning 
of this section. 

By Lemma 3.24, to achieve (29), it suffices for each T' C Tbad,n to prove the following 
£ \Rt\ <2- 3 «A- 2 ||F J || 2 log 2 (2+ ||7V>|| L oc), (30) 



If there are more trees with the same £t which qualify to be selected at a certain stage, we select 
any one that maximizes \Rt\ 
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and 

i4 E I^t| <2- 3 "A- 2 log 2 (2+||iV^|Uoc) (31) 

' ' T£f':R T CR 

for each dyadic R. 

We only prove (30), then (31) will follow by localizing the techniques (In particular, 
appealing to Proposition 3.26 rather than Proposition 3.25). 

We apply Lemma 3.27 to the collection 1Z := {i? T : T G J 7 '} and to n + 1, and denote 
by JF'" and jF /b the two collections of trees that arise by this application. It suffices to 
prove 

£ I^t| <2- 3 "A- 2 ||F,|| 2 log 2 (2 +\\Nr,\\ Loo ). (32) 

Denote N' := ||iYp/|| i0 o. We certainly have 

II E ^ +1 i ?T ||oo<2 4 "iV' 3 . (33) 

For each y G R and each T G JF'" denote by the subtree consisting of all P G T 
such that y G 2 n+1 J P . Split the collection of multi-tiles \J T&:F ,iT y into at most 2™ +2 
reliable subcollections. Denote them with Ci >y , i G I y with j^l y < 2 n+2 . Thus, each Ci )V 
will consist of the union of reliable trees, each of which is a subtree of one of the trees 22 
T y , with T G T'K 

We claim that for each given y and i G I y , the collection of the one dimensional trees 
induced by the trees in d jy will form a (one dimensional) forest. 

To see this, note first that the induced one dimensional trees are lacunary, since the 
subtree of a lacunary tree is itself lacunary. 

Let now (T reh £ T , R T ) and (T^, f T ,, R T ,) be the subtrees of (T, £ T , R T ) and (T", f T ,, i? T ,) 
that are in C ijV . The fact that their induced one dimensional trees are disjoint (as collec- 
tions of tiles) follows from the fact that Cj y is reliable. 

The proof of the fact that the induced one dimensional trees (T re i, £t, -^t) arid (T' re i, £t', -^t') 
are strongly disjoint goes by contradiction. Assume P G T re i, P' G T' rel are such that 
ujp C ujp, and I T n I P i = 0. The first condition, the upper lacunarity of both (T, £ T , i? T ) 
and (T', ^t', -Rt')> an d the fact that \oJp\ < 2~ J |cjp/| easily implies that the tree T was 
selected before T'. It also follows that a>r C o)p/. On the other hand, It H Jp' = 
together with the fact that y G 2 n+1 J T n 2 n+1 Jp/ implies that P' would have qualified to 
be eliminated when T was eliminated, that is before the selection of T' (more precisely, 
P' is in one of the trees in C((T, Rt), n)). The contradiction is immediate. 

Since for each y and i G I y , each T G T'^ contributes with at most one subtree to C^ y , 
and since y G 2 n+1 J T for each such T, it follows by (33) that the counting function for 
the one dimensional forest induced by d, y obeys the bound 23 

II E X/ T (*)l|L»(,)<2 4 "iV' 3 . (34) 



T contributes to Cj 



22 Each T e T'^ will provide at most one such subtree for each given C^ y 
23 This holds for a.e. y 
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By (28) we have 

E \Rt\ < 2- 19 "A- 2 / E E \\xl^y)T m AH-,y)m\\lidy 
= 2 -^x- 2 f E E \\x}l(^y)TrnAF i (-,y))(x)\\l l dy 
= 2- 19 -a- 2 /EE 

Using the fact that for each y and i, Ci >y is a one dimensional forest with counting 
function estimate like in (34), it follows by Proposition 3.25 that the expression above 
can further be bounded by 

2~ 18ri A- 2 / ||F l (x,y)|| 2 ; 2 log 2 (2 + 2 4 "iV' 3 )^<2- 17 «A- 2 ||F l ( a ;,t / )|| 2 ; 2 log 2 (2 + iV'). 
Jyen 

This ends the proof of (30), and thus of (29). 

We next take care of the overlapping size. The argument is essentially the same as 
before, with two differences: a simplification arises due to the fact that the contribution 
to each tree arises only from one tile-like boxes, namely the top of the tree Rt x ujt] there 
is however also a technical complication to our argument arising from the fact that the 
boxes Rt x u>t are not tiles, in general. In particular, u>t is not in general an element of 
the grid T>q. We explain how to overcome this technicality below. 

Recall that P* is what is left of the initial Po, after the algorithm described above was 
performed (for both upper and lower lacunary trees). Note that P* is convex. 

For each overlapping tree (T, £ T , _R T ) in P*, each v G 10lu t and each s > define 

"t,s,v := i^ + 2~ s 10K|,t; + 2 2 - s 10K|}. 
u -^ v ■= { v - 2 2 - s 10|cj t |,w - 2~ s 10|cj t |}. 

Note that IOc^t \ {v} C {J s>0 (^t sv U w t s Ji anc ^ that each m T adapted to 10u; T which 
vanishes at v G 10u T can be written as 



'" i E 2 s "'t.>,,- -E 2 " 



s>0 s>0 



where sv is adapted to w^ s1J and supported in IOc^t while m T sv \s adapted to s v 
and supported in IOc^t- 

It suffices to guarantee that after the elimination process ends we are left with no 
overlapping trees (T, £t,Rt), no s > 0, no n > 0, no v G IOc^t and no m TiSjl , which 
is either adapted to u T s v and supported in IOut or adapted to cu^ s v and supported in 
IOut such that 

|^IIXH x ,n(^y)^ T , 3 ,U^(-,Z/))(^)IU^ > 2— 10 A. (35) 

The selection process goes as follows: we will run the following algorithm for each s > 0. 
We first run it for s = 0, and then we increment the value of s and run the algorithm 
again. 



22 



CIPRIAN DEMETER AND CHRISTOPH THIELE 



• Step 0: Set n = 0, P** := P* and F ba d,n,s,- = 0. 

• Step 1: Select a "bad" tree (T, £ T , R T ) in P**, that is an overlapping tree satisfying 
(35) for some v = v t G IOc^t and some mT, s ,, which is adapted to ^ st) and 
supported in IOc^t- Moreover, we select the tree with minimal vt- Put this tree 
in the collection Tb a d,n,s-- If no such tree is available, go to Step 4. 

• Step 2: Construct the collection C((T, £ T , P T ), n, s) to consist of the following 
convex trees: for each J- dyadic square R C 2 n+3 Pr with sidelength equal to 
that of Rt (and there are 2 2n+6 of them) we let Tp be the maximal tree with 
top data (£t,-R); the collection C((T, £ T , P T ), n) will consist of all these (at most 
2 2n+6 trees). Eliminate all these convex trees from P**, that is, reset P** := 

"P** \ I I r-p/ 
r \ UT'eC((T£ T ,R T ),n) 1 

• Step 3: Go to Step 1 

• Step 4: Reset n :— n + 1, J-bad,n,s- = 0, and go to Step 1. 

As before, while the algorithm runs forever for each given s, it will produce no bad trees 
for large enough n, since P** is finite. After we are done with eliminating the trees for 
a given s, if P* is nonempty, we repeat the algorithm with + replacing — and the same 
s (the only difference is that in Step 1, vt will be maximal). We then increment s and 
repeat the above procedure. 

We end up with the collections (Fbad,n,s ,-)n,s>o and (Fbad,n,s,+)n,s>o of trees. 

The Bessel inequality for the selected trees will follow by an argument very similar to 
the one above for lacunary trees, and from the following observation: 

If (T,£ T ,P T ) and (T', £t', Rt') are trees in Tbad,n,s- f° r given s,n, then the boxes 
cj^ „ „ x 2 n+1 i? T and ujZ, e „ x 2 n+1 R T > are disjoint. 

The proof of the above goes by contradiction. Assume the two boxes intersect. Without 
loss of generality we may assume \Rt'\ < \Rt\- Let P G T, P' G T' such that 24 ^ T G 2uj p 
and ^t' G lujpi. We distinguish two cases: 

The first possibility is that \Rt>\ = \Rt\- Since |^t — Ct'| < 20min{|u;p|, \oup'\} and due 
to Remark 3.8, it follows that u)t C ojpi and cut' C ujp. On the other hand, we see that 
since 2 ri+1 i? T n2 n+1 J RT' ^ we have that R P , C R T , C 2 ri+3 J R T and R P C i? T C 2 n+3 J R T '- 
These two facts imply that if T was selected first, then P' would have qualified to be 
eliminated at that stage, and hence it would have not been available when T' was selected. 
The symmetric statement also holds, and the contradiction arises. 

The second possibility is that \Rt\ < \Rt\- Since u; T (~)lu„, , ^ it easily follows 

that T was selected first. By reasoning as above and by using the fact that \uj t \ < 2~ j \uj P i \ 
it follows that lj t c lDp>. On the other hand, we see that since 2 n R T n 2 n R T / ^ we 
have that Rp> Q R T , c 2 n+2 i? T , which means P' should have been eliminated at the same 
stage T was eliminated. The contradiction arises again. 

This ends the proof of Proposition 3.19 in the case % G {1, 2}. The case % = 3 is entirely 
classical. We are now dealing with two dimensional trees and two dimensional sizes that 
generalize naturally their corresponding one dimensional counterparts. In short, we suc- 
cessively eliminate maximal trees (with no distinction between lacunary and overlapping 
this time, no minimality assumptions on £ T ). The fact that these trees will form a two 
dimensional forest (in particular they are 3-lacunary) follows from Remark 3.9 and from 
the fact that each selected tree is maximal. We omit the details. 



Such P and P' must exist since trees are by definition non-empty 
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3.1.7. Phase-space projections. Throughout this section we will assume (T, £ T ,i? T ) is a 
convex tree with \Rt\ = 1, so that jx — 0, and £t — 0. We also work with F t be as in 
(10). We will consider pi > 2 and denote by &i — — for i G {1,2}, while 6% will be an 
arbitrary number in the interval [0, 1], independent of Pi,P2,P3- Our goal is to construct 
phase-space projections associated with each function F; t and the tree T, and to prove 
Proposition 3.37. In doing so, we follow the terminology and approach from [14]. 

We note that Xj has Fourier support in the region {(£,77) : |£| < 2 : > J ~ J } while Xj i n the 
region {(£,??) : |£|, \rj\ < 2 J i' J }. 

Let w be a positive function on R and r > be a number. We say that w is essentially 
constant at scale r if there is a constant C = 0(1) such that 

C-i(l + l^Ayioo < M*) < c(1 + kn^)ioo (36) 

for all x, y G R. In particular, the weights xi are essentially constant at scale |/| or less 
when |a| < 100. 

We shall need the following weighted version of Bernstein's inequality, see [14] . 

Lemma 3.28. Let f : R — > C be a function whose Fourier transform is supported on an 
interval u of width 0(2- ? J ) for some integer j . Then we have 

IKIIoo<2^ 2 |K|| 2 

for all weights w which are essentially constant at scale 2~- j J . The implicit constant in 
the above inequality only depends on the constant C from (36). 

Proof We can write / = T m f where m is a suitable bump function adapted to 2uj. From 
the decay of the kernel of T m we thus have the pointwise estimate 

\f{*)\ = \T m f(x)\<V'J {1 + ^[ yl)W dy 
and the claim easily follows. ■ 

Lemma 3.29. Let (T,£ T ,i? T := I T x J T ) be a selected tree and j,f G Jt with j < j' . 
Then 

Ej',t Q Ej 7 t. 

Also, 

|| 2- j ' J #^,t,,||l ? (/ t) < |/t| 
ieJ T 

(with a similar statement for y ) , where 

Ej,t, x :=E JiT n({x}xR). 

Proof The argument is the same as in Lemma 4.8 in [14], by noting that the cross 
sections Ej^ tX share the same properties as Ej tT . m 

Recall that Lemma 3.29 was used in Section 3.1.4 to replace the spacial truncations in 
Proposition 3.16 by certain smoother variants of themselves, thus reducing the proof of 
that proposition to proving (18). 

Let now (T, £ T , R T ) be a (not necessarily convex) tree. We also need to work with the 
following variants Ej of Ej jT which enjoy better regularity properties. 
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Definition 3.30. Let R T be the collection of all maximal dyadic squares R C R T which 
have the property that 3R does not contain any of the squares R P with P G T. For an 
integer j > let Ej be the union of all squares R in R T such that jp > j. For an integer 
j with j < jr we define Ej = 0. 

The sets Ej obviously depend on the tree T, but we suppress this dependence. 
Clearly the intervals in Rt form a partition of Rt and the sets Ej are nested. The nice 
regularity properties are stated in the following lemma: 

Lemma 3.31. Any two neighboring squares in Rt differ by at most a factor 2 J in their 
sidelength ratio. 

The set Ej is a union of J- dyadic squares of sidelength 2~- j J and contains Ej jT if 
j G J T 

The following two lemmas will be mainly applied together and are the main ingredient 
behind the estimates on phase-space projections in Proposition 3.37. 

Lemma 3.32. If Ro is a J- dyadic square with sidelength 2~- ? ° J such that 3R D Ej ^ 0, 
then there is P G T such that Rp C 10i?o an d jp > Jr,, ■ 

Proof There is a dyadic square Ri with sidelength 2~~ joJ which is contained in E jo (~)5R . 
By the definition of Ej , 3R± contains some Rp, with P G T. Since 3R\ C 7Rq, it follows 
that Rp C 7Rq. The claim now follows from the fact that both Rq and Rp are J- dyadic. 



Lemma 3.33. We have 

UZ^T^F^.^x^h < \Rp\^ 2 size*(T) (37) 

for all indices 1 < i < 2, all trees T, multi-tiles P G T and symbols m P% adapted to uj p . 
of order N 2 . A similar statement holds for i = 3, with the obvious modifications. 
We also have 

WxZix^T^F^-^ix^h < lit^size^T) (38) 

for all indices 1 < % < 2, all trees T, multi-tiles P G T and symbols mp i adapted to 10c<jp- 
of order N 2 which in addition vanish at some point vp G 10up. 
Moreover, 

\\x^(x,y)T mR (F t (;y))(x,y)\\ 2 < \R\^m^(T) (39) 

for all indices 1 < % < 2, all non-lacunary trees T, all J -dyadic intervals R for which 
there isPeT with Rp C 10R and all symbols mp adapted to 5oo^ T) p. 

Proof Inequality (37) follows from the fact that ({P}, Rp) is a lacunary tree for some 
appropriate £, see Remark 3.7. 

Similarly (38) follows from the fact that ({P}, c(uo P ), R P ) is an overlapping tree and 
the fact that uj c (u) P ),r p = ^p- 

Now we consider (39). Observe first of all that \Rp\ < \R\ because both squares are J- 
dyadic. By translating R, we may as well assume Rp C R. Namely, we have to translate 
R by at most ten times its length, and observe that xr stays the same up to some bounded 
factor. 
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We consider the two cases \R P \ = \R\ and \R P \ < \R\. 

Assume first \R P \ = \R\ and thus R P = R. Observe that by non-lacunarity and the 
fact that \uj p \ = \ou^ T: r\, it follows that oo P is strictly contained in 5uj^ Ti r. Hence 5ou^ Tj r 
is contained in 10u P and the two intervals are comparable in size. The claim now follows 
from (38) since m p also vanishes at any v G 10u P \ oo^ Tt R- 

Now assume \R P \ < \R\. We consider again the singleton tree {P} with top data 
(£, R) so that £ is an endpoint of 5u^ Tt R. Again, by non-lacunarity we see that these top 
data indeed turn {P} into a tree. Note that 5u^ TtP C IOu^r and the two intervals are 
comparable in size. It follows that the multiplier m p is adapted to IOu^r (with a possibly 
larger constant). Moreover, m R vanishes at any v G IOlo^r \ 5^ T)R and (39) follows by 
definition of the tree size. 



Remark 3.34. Inequality (37) controls projections associated with a multi-tile, and inter- 
venes in the estimates for the phase-space projection in the lacunary case (i.e. the case of 
lacunary tree when % G {1,2}, and the case of a general tree when % — 3). See the proof 
Proposition 3.37. It was also used (via Lemma 3.35 below) to replace functions by their 
phase-space projections in the model sum associated with a tree. See Section 3.1.4. 

On the other hand, (39) controls the phase-space projection on a one dimensional tile- 
like region which is not necessarily a one dimensional tile. This estimate will be used 
repeatedly in the estimates for an overlapping tree in Proposition (3.37). 

An easy application of the above lemma gives 

Lemma 3.35. For all 1 < % < 3 ; j G Jt ; and J- dyadic squares R with j R = j we have 

Wx^Ml^r) < litf^sizeKTf . 

Proof By interpolation it suffices to prove the bounds 

uT*Ml*{ R ) < ii?r /2 size:(T), 

n xl / 6 ~ c ii <r i 

\\Xj ^jFi\\L^(R) < 1, 

and (in the i — 3 case only) 

II^j KjF 3 \\L~(R) <size*(T). 

The second estimate is immediate from the boundedness of the F i: while the third follows 
from the first and (the 2 dimensional version of) Lemma 3.28. 

Thus it suffices to prove the first inequality. Fix i, j, R. There exists P G T with 
\Rp\ = \R\ such that we have the pointwise estimate 

on R. It thus suffices to show that 

\\Xr f *Mv S l^| 1/2 size*(T). 
This however was observed in (37). 
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For each y G Jt we define the cross sections 

4:=£ 3 n(Rx{t/}). 

We also denote with Qj the collection of connected components of Ej. Note that such a 
U G Ctj may not necessarily be a square, however, from Lemma 3.31 we know it is a union 
of dyadic squares of sidelength 2~ jJ . It follows that each such U can be decomposed as 
a disjoint union of dyadic rectangles I x J, such that \J\ = 2 _ - ? J , / is a J-dyadic interval 
whose length is an integer multiple of 2~ J J and such that the line segments x\ x J and 
x} x J lie on the boundary of Ej (or equivalently, on the boundary of U). Here x\ and 
x\ are the left and right endpoints of /, respectively. We denote with Vlj the collection of 
all such rectangles I x J that arise by decomposing each U G 

Lemma 3.36. Let (T,£ T ,i? T ) be a (not necessarily convex) tree. Then for each y G Jt 
which is not a dyadic point 25 

^ J #dE hy < |/ T | , (40) 

ieJ T 

with the implicit constant independent of y. 

For each R = I x J G Qj, and let ij and JJ denote the intervals 

I) := (x l I -2- jJ -\x l I -2- jJ - 2 ) 

I r . : = (x r I + 2-3 J - 2 ,x r I + 2- lJ - 1 ) 
and define R\ := l\ x J and R r - := F- x J 

3 3 3 3 

Then the intervals R l j are disjoint as j varies in the integers with j > and R varies 
in Qj, 

Moreover, for any two such intervals R l j and R! l y their "horizontal" distance 26 

min \x — x'\ 

(x,y)GR l 3 , {x> ,y)eR> 1 ., 

is at least max{2 _ ^ J+3 \ 2~ <J ' J+3 )}. Similar statements hold for the rectangles Rj. 

Proof The proof of the lemma is a reprise of the arguments involved in the proof of 
Lemma 4.12 in [14]. The underlying philosophy is that for each y the sets Ej :V inherit 
much of the properties of the sets Ej. To illustrate this principle, we will sketch the 
argument. 

Fix y which is not a dyadic point and let X y j be the collection of the connected com- 
ponents of Ej n (R x {y}). This collection is nothing else than the collection of intervals 

{RD (R x {y}) : R G Qj}. 

All the claims of the lemma will follow if we prove that / G F y j, F G F y ji and j' > j 
imply that the distance between /j and F l j, is at least 2~i J ~ 3 . Indeed, this will imply in 

25 The set of dyadic points, i.e. endpoints of dyadic intervals, form a set of measure zero, so restricting 
to their complement will not affect the later part of our argument. Sets of measure zero will be repeatedly 
ignored in the following. 

26 This distance is the same if measured at any y e J n J' 
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particular the disjointness of /■ and I'y, which in turn, will imply (40) (since all Jj are 
contained in 3J T ). 

Let now / and I' as above, it remains to prove the claim about the distance. Let J 
be the dyadic interval of length 2~^ J containing y. Thus / x J is an element of Qj. Let 
R = I x J be the unique dyadic square of sidelegth 2 _ - ? J such that the right endpoint 
of Iq coincides with the left endpoint of /. By the definition of Vtj it will follow that 
Rq fl Ej = 0. Due to nestedness, this implies R fl Ey = 0. The claim follows. 

■ 

As an immediate consequence of the above proposition we have that 

E £(13-1 + 1^1) £1^1- (41) 

We introduce for each j > and 2 < p < oo the weight function 

H, P ^y) = J2 2 ' 4 ^ E (i+2^-^ir 100 . 

3'>0 z ^ d ^j',y 

This function will be used to quantify the extra gain we obtain when interacting different 
scales. 

Note that due to Lemma 3.36 we have sup^ Ha^Hl^ < p 1 and moreover, due to (40) 
we have 



Jr2 j 



<p \Rt\ (42) 



We will construct the associated phase-space projections as follows: 

Proposition 3.37. For each 1 < i < 3 there exists a function Hi(Fi) such that 

• ( Control by size) 

Hn^FOIU^I^r^sizeKT)^ (43) 

• (Yls(Fs) approximates F 3 on T): For each j G Jt we have 

X>,F 3 = ^n 3 (F 3 ) (44) 

where Sj is a suitable 2 dimensional Littlewood-Paley projection to the frequency 
region {(£,??) : |f| < C 2^' +1 ) J , 2? J ~ J < \q\ < 2^' J+J } 

• (Tlj(Fj) approximates Fj on T, i G {1,2}; the lacunary case): Assume T is lacu- 
nary. Then for each i G {1, 2} ; j G Jt 

XjitjFi = SjlUW (45) 

where Sj is a suitable 1 dimensional Littlewood-Paley projection to the frequency 
region {£ : 2^~^ J < |f | < 4000 x 2^' +1 ) J } 

• (Hj(Fj) approximates Fj on T, i G {1,2}; the overlapping case): Assume T is 
overlapping. Then for each i G {1,2} ; j G Jt and all J— dyadic squares R = 
Iq x J with j R{) = jo we have 

\\x)i%M - IL(F))|| iMi?o) < size^T)^!^!^- 1 / x\» 30 , Pi (46) 

JRo 
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• (Local control by size) For each 1 < i < 3, j G Jt and each J- dyadic _R with 
3r = jo we have 

ll^^n^)!!^) < size*(Tf li^. (47) 

Proof We define 

n 3 (F 3 ) := 

jeJ T 

and in the case the tree is lacunary and i G {1, 2} 

n,(F 4 ) := XjtjFi. 
jeJ T 

As mentioned before, we only need to smoothen out the spatial component which is 
associated with a frequency projection, and we do that in order to preserve frequency 
localization, and thus orthogonality. If no frequency projections are present (and this is 
the case with the y component of both F 1 and F 2 ), then rough spatial cutoffs suffice. 

The proof that the above projections satisfy all the required estimates follows exactly 
the same lines as the proof in the lacunary case of Proposition 7.4 in [14]. This is a fairly 
easy exercise compared to the overlapping case presented next, since (for J large enough) 
the portions of the projections corresponding to different scales are pairwise orthogonal. 
We omit the details. 

We now construct the projections in the case % G {1,2}, when the tree is assumed 
overlapping. The construction follows again closely the lines of that in the non-lacunary 
case in Proposition 7.4 in [14], with a few modifications. We include the argument for 
completeness. 

We start by defining the projection. The projections turn out to be identical for % — 1 
and i = 2. We will drop the i index on the function Fj. 

For each real number j, let Tj be a one dimensional Fourier multiplier (defined, say, 
by dilations of a fixed multiplier) whose symbol is supported in the frequency region 
{|f I < 2 2+jJ }, and equals 1 for {|f | < 2 1+jJ }. Let Sj be the associated Littlewood-Paley 
projections Sj := Tj — Tj-\. We may assume that the kernels of Tj and Sj are real and 
even. These multipliers will act on the first variable of functions on R 2 . 

A first guess as to the construction of IT(F) would be 

Ui(F)(x,y) := XE T j(x,y)F, 
where for each (x, y) G E we define the integer- valued function j(x,y) by 

j(x,y) := max{j > : (x,y) G Ej} 
One can expand IT(F) as a telescoping series: 

U i (F) = X E T F + Y l XE i S j F (48) 
i<i 

= XeToF + XrS 3 F. (49) 

i<j Retij 

This proposed projection turns out to obey (43), but does not obey (46) due to the poor 
frequency localization properties of the characteristic functions xr i n (49). Specifically, 
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the cutoffs destroy the vanishing moments of the SjF, and this will cause a difficulty when 
trying to sum in j because the projection jtj is non-lacunary. 

To get around this problem we shall modify each term XR^jF to have a zero mean 
on each y fiber. In order that these modifications do not collide with each other, we 
shall place them in disjoint rectangles, namely in the rectangles Hj and R l j constructed 
in Lemma (3.36). 

Let R = I x J G flj for some j > 1. Let j and ■ be some functions supported in 
and J[, respectively, uniformly bounded by 10 and with total mass 

l _ / AX LU. _ n-jJ 



(f)jj(x)dx = J (jf I j(x)dx 

Decompose xi as Xi = H\ + H T , where H\(x) := H(x — x\) and Hj(x) := —H(x — x r T ) 
are shifted Heaviside functions. For each y G J define the quantities (y) and 7 (y) by 



4j(y):=y J J H\{x)S 3 F{ X) y)dx , 



Define now the functions 4> l R j(x, y) := (f> l ij(x) x c l j j{y) and (p r R j{x, y) := </>i x c /,j(l/)- 
Note that they are supported on i2j and i2J, respectively and that 



[Xij(x, y)SjF(x, y) - <f) l R Ax, y) - y)]cfc = (50) 



for each y G J. 

We can now define the correct form of the projections 



IU(F) :=II(F)-2^(#y + #w)- 
i<j ReUj 

The control on the functions (J) l Rj (and a similar control holds on (f) r R j , too) is provided 
by the following lemma: 

Lemma 3.38. Let R = I x J <E Qj. Then we have the estimate 

I J (v)l < 2 jJ f \SjF(x,y)\dx 

ll^l^sizeKT)!^! 1 ^ (52) 



and 



H^kiHoo < 1 (53) 

Proof Estimate (51) was proved in Lemma 8.1 in [14]. Note also that (53) is a conse- 
quence of (51) and the fact that ||-F||oo < 1- 

It remains to prove (52). Denote by R' the J-dyadic square [x\, x\ + 2~i J ] x J . Note that 
R' is a subset of Ej. By Lemma 3.32 it follows that there exists P G T with Rp C 10/2'. 
Using this and the fact that Sj is associated with a multiplier adapted to 5u^ T ^ R /, (52) 
will now follow from (39). ■ 
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We now prove (43) in the case % G {1,2} and the tree is overlapping. It suffices by 
interpolation to prove that 

PWIUSl (54) 

and 

l|n,(^)|| 2 < |i? x r/ 2 size*(T). (55) 

Let us start with (54). By (53) and disjointness of supports, the contribution of 4> l R j 
and 4>R j is acceptable. It remains to prove that 

||n(F)|| 00 =||T J - (!e>I , ) F(x,i/)|| 00 <l. 

Note however that this is an immediate consequence of (10). 
Next, we focus on (55). Again, it suffices to prove 

\mF)h<\RT\ l/2 ^m (56) 

and 

£E(W + W^l^l 1/2 ( 57 ) 

j>i ReQj 

(by taking into account (52) and the disjointness of the supports of (f) l R j and <f> R j)- The 
last inequality was however observed in (41). 
To prove (56) we expand 

Ri( F ) = ^2x Ej \E J+1 T j F = E E XRn Ej \E j+1 T 3 F - 

0<j 0<j RnB J \B J + 1 ^«:j B =, 

R is J — dyadic 

As j and R vary in the above sum, the sets R fl Ej\Ej +1 are pairwise disjoint, hence it 
suffices to show 



E E W T 3 F Wh(R) S l^ T |size*(T) 



0<j flnB J \E j+1 ^0 :JH =j 

R is J — dyadic 

For each j and R in this sum there is a J- dyadic square Rl C R with j R > — j R + 1 
which is contained in Ej\Ej +1 . This follows from Lemma 3.31. As j and R vary, these 
intervals R' are pairwise disjoint. Hence it suffices to show that 

WTjFWvw < l^l^size^T) 

for all j, R in the above sum. But for such j, R we can find a multi-tile P G T with 
Rp Q 10 R by Lemma 3.32. The claim then follows from (39), since Tj is associated with 
a multiplier adapted to 5lj^ T: r. This proves (56) and thus (55). The proof of (43) is now 
complete. 

The estimate (47) will follow from (46), Lemma 3.35, the triangle inequality, and the 
fact that the /x JiP are uniformly bounded. Thus it only remains to verify (46). 

Fix jo > and Rq such that jn = jo- From the frequency support of jtj we may 
replace F - U^F) with T jo F - U^F). 
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We now decompose 

T JO F-U t (F)=x n \E j T JO F (58) 

-Xn\E j0 MF) (59) 

+ Xn\E j0 E E 4 (6°) 



"Jo 

i<i<io Ren, 



+ Xn\ EjQ E E<^ (61) 



"Jo 

i<i<io flefy 



jo<j R=IxJ£ftj 

E E (63) 



where 



G^(x,y) := H^lMSjFix^-^x.y) 
G r Rj (x,y) := HtWlMSjFfay) - <P R ,(x,y) 

The first four terms are treated exactly like in [14]. Since they are supported outside 
Ej , only the part of fij will enter the estimates, where we define 

*%£(*,V)= E E (l + 2 J ' J k-^|)- 

o<i<io zedE; „ 



-100 



Note that is constant on J-dyadic squares with sidelength 2~i° J , in particular on 

Rq. The estimates for the first four terms above will follow by interpolation from the 
following two estimates 

llxf%.(( 58 ) " (59) + ( 6 °) + (61)) 11^(^0) < TTrW ize *( T ) / 



Wxl % ((58) - (59) + (60) + (61))|Uoo (ilo) < n^size*(T) / ^ . 
We omit the details. 

We however sketch the proof of the estimate for the terms (62) and (63), since this 
is the more delicate case. Again, all the needed technology is already in [14], but some 
extra care is needed, since in this case Rq interacts with scales j greater than jo- Due 
to this, it is : = fJ>j ,pi — ^jfpl ^ na ^ w ^ enter our estimates. Since rf^ is no longer 
constant on Rq, for each j > jo we will have to split Rq in rectangles of the form Jo x J' 
with | J'\ = 2~i L and get estimates on each of these rectangles, which would then add up 
to the desired global estimate (i.e. on the whole R ). 

The factor xj^ 6 will be useless here, the decay will come from other sources. More 
precisely, we will prove that for each j > j and each R := I x J e Qj we have 

UjoGrJl^Ro) < t^tt [ [ XRo (x,y)(l + ^ J \x - x\\)- WQ dxdy (64) 

\ n 0\ Jy&J JxeR 
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oO'o-jV/2 r r 

l-ttol ' JyejJxeR (65) 

with similar estimates for G r R j. If we have these estimates, all we have to do is interpolate 
them for each j, and then add the resulting estimates over all j > jo and all R G Qj. 
Note that we do not get any extra decay of the form 2^°~^ € for the L°° estimate in (64). 
We do however obtain such a decay for the L 2 estimate in (65), and by interpolation, the 
decay factor will have a Pi dependence. This is the main source of the Pi dependence of 
the function )Pi . 

Fix j > j and R = I x J G IX,. We start by proving an estimate for G l R j. Observe 
from Fourier support considerations that for each y 

w jo ((T j _ 1 H')S j F) = 0, 

in particular, (Tj-iH\)SjF has mean zero for each y. It thus suffices to get estimates for 

F l jiR (x,y) := G\ R {x,y) - (T j ^ 1 H l I (x)xj(y))S j F(x, y) 

= [(1 - Tj-jHlWxjmSiF&y) - 

■ 

We aim first at showing that for each y G J and each x G R 

\fU*,»)\ S 2 JJ/2 d + *V - 4l)- 2M ll ( J|^! |)50 ll^w (66) 

This estimate is clear for the 4> l Rj part of F- Ri , due to (51). It remains to prove it for 

[(l-Tj-MWxjmSjFfay). ' 
Fix y G J. From repeated integration by parts we have the pointwise estimate 

|(l-T,_ 1 )iJ}( a ;)|<(l + 2^|a;-4l)- 250 
so it suffices to show that 

\S,F( X ,y)\ < 2^(1 4- a"|* - *\r\\ {1+ %%L% n J \^y 

Note however that this is an immediate consequence of Lemma 3.28. 

We next use the pointwise bounds for Fj R (x,y) in (66) and the fact that Fj R (-,y) has 
mean zero to get the following bound for the antiderivative V~ x F l - R 

IV-'^x. y)| < 2-^(1 + 2"|» - x- (l -jl^J^ n^. (67) 
We continue by noting that 

*A = vit jo (v- 1 Fj tR ). 

An easy computation shows that if xo G Io then 

iV^V-^Ky)! < 2^> J J \xl(x')W- l F l hR (x',y)dx'\ (68) 

< 2 ( 2j - -§)J|| [ xl{x'){l + 2i J \x' -xWY^dx' 

ll (l + 2^|x / -4|) 5oN {X) J A 1 /I; ^ 
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The estimate (64) now follows from the above and the fact that 

II \SjF(^y)\ || <2 -:j/2 
"(l + 2^|a:'-4|) 5ol|L {x) ~ 

To prove (65) we first denote by R' the J- dyadic rectangle defined as [x\, x l j + 2~ jJ ] x J. 
We further observe that (69) implies that 

\\^jo G R,j\\L 2 (,R ) = \\^jo F R,j\\L 2 (R ) 

■■■ J {[ .Tr-i'V" J xl( X )(l + ^\x-x[\)-^dx 

< 2&-to J \\x%S j F\\ 2 J xl(x)(l + V J \x - x\\Y™dx 

< 2(^-D J size*(T) f I xl o {x,y){l + y J \x-x\\y™dxdy 

Jj&j J 

^ ~IWW^ S ^( T ) / / j&o(x,y)(l + V J \x - x^dxdy 
\ n o\ 1 J y eJ AeR 

where the penultimate inequality follows from Lemma 3.32 and (39). This end the proof 
of (65) and of Proposition 3.37. 

3.2. The cone / * = 0. 

Our goal now is to prove (2) in the case J \& = 0. We will first note that the tech- 
niques developed in Section 3.1, combined with the type of analysis that solved the one 
dimensional Bilinear Hilbert Transform (see [10], [11], and also [16] for a more detailed 
exposition), can address this case, too. Here is a brief explanation why. 

Note that the model sum in (2) represents a (one dimensional) Bilinear Hilbert Tranform 
in the x coordinate. This is due to the special cancellation condition ^ = 0. One works 
with one and a half dimensional trees and phase-space projections as in Section 3.1. The 
difference is that there will be no orthogonality coming from the y component of the third 
function (as was the case before; in particular trees are not automatically 3-lacunary), 
but rather from the special localization in the x component (the same type of localization 
as in the case of the one dimensional Bilinear Hilbert Transform). The same sizes size* 
will control phase-space projections of Fi when i e {1, 2}. The property of being lacunary 
or overlapping will be determined only by the x component. For each % G {1,2,3} we 
will have trees which are %- overlapping, and they will necessarily be i' lacunary for each 
i' 7^ i. Some of these features are present in the alternative argument we present below. 

We choose to present this alternative argument, since it provides a slightly different 
angle, and since it is "cleaner" for exposition purposes. One of its advantages is that 
it avoids 27 the technicalities behind phase-space projections, that were present in the 
previous section. 

3.2.1. Discretization. 

The collection P of multi-tiles in this context will consist of P = I P x uj Pi x uo P , 2 x ujp 3 
with the following properties 



27 We will be able to discretize in such a way that the phase-space projections enter the picture in a 
natural way 
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Definition 3.39. 

• Each component uj p . of some P G P determines uniquely the other two (frequency) 
components up j of P. 

• u)p i are elements of a shifted dyadic grid, while Ip is an element of the standard 
dyadic grid 

• \oJp 1 1 = \ujp 2 1 = \ujp 3 1 = \Ip\ 

• \uj p . I = 2 Jj for some j G Z, where J G N is a fixed large enough natural number. 
Such intervals will be referred to a J- dyadic. 

• \u>i\ = \cull and Ui ^ lo\ imply dist(u;j, u;Q > 2 J \u>i\ 

• If for some £ G R we denote (£1,62) £3) : = — 2£); then ^ G 2uj p . for some 
i G {1, 2, 3} implies that ^ G \ 2cj Pj /or eac/i j 7^ %, where C is some large 
universal constant. 

If P is a multi-tile, we denote bj Pi := Ip x u>p i the associated tiles. 
Let ip be a function whose Fourier transform is adapted to [—1/2, 1/2]. We will denote 
by ipp i the wave-packet localized in the tile Ip x up t , that is 



By using standard reductions, in order to get bounds in this case for (2), it suffices to 
prove the boundedness of the model sum 



where J y> p is the unique dyadic interval of length \Ip\ containing y, and the supremum 
above is taken over all ip with Fourier transform adapted to [—1/2, 1/2]. 

We will change the angle a bit and rewrite the above expression in a slightly different 
way. 

Definition 3.40. A hyper-multi-tile (P, J) is a multi-tile with an extra spatial component 
Jp, where Jp is dyadic and \I P \ = |J|. 

A hyper-multi-tile also has an extra frequency component, that is [— ||J| -1 , 
Since this component is implicit, we will omit it, and always write (P, J). 

A hyper-multi-tile will serve the purpose of localizing in time-frequency 2 dimensional 
wave-packets like x tpj. 

Let Phyper be an arbitrary finite collection of hyper-multi-tiles. For each y we denote 
by Pj, the collections of multi-tiles P such that (P, J y ,p) G Phyper- 

To simplify notation, for each y and each P G P y define 




We will also use the notation 





a Pi(y) = \( F i( x 'iy)i ( pp i ( x '))x' 



for i G {1,2} and 



1 



sup I (P 3 (V , y') , cpp 3 (x')ip Jy P {y')) x ',y' 

* J v,P 
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Define also for each (P, J) G Ph yP e 



b P)J := sup\(F 3 (x',y'),<pp 3 (x')il>j(y')) x > t yi\, 

■4>j 



and 

1 3 

7(/- ; . /•,./•;,)(,•.//): J2 T^^ nri!l)Xl 



PeP 

2 



|7~p tl a ^(y) b p,JXi P (x)xj(y)- (70) 



(P,J)€P hyper ' P ' »=1 



A standard limiting argument shows that it suffices to prove 
Theorem 3.41. For each 2 < Pi < oo with — + — + — = 1 we have 

n Pl P2 P3 

3 



\\T(F u F 2 ,F 3 )\\ Lly <nil^lU- 



i=i 



Moreover, the implicit constant in the above inequality does not depend on P hyper- 

3.2.2. The proof of Theorem 3.\1. We now fix Ph ype r, and will not index any quantity (T, 
P y , etc) by it. 

By further invoking interpolation and the dilation invariance of our operator, it suffices 
to prove that for each 2 < p; < oo with — + — + — = - and — < ^ and each llFjlL. = 1 

f fl pi P2 P3 P Pi 2' ii 8 hp* 

we have 

\{(x,y):T(F 1 ,F 2 ,F 3 )(x,y)>l}\<l. 

The way we prove this is by constructing an exceptional set £cR 2 with E < 1 such 
that for some appropriate t > 1 

1 ^ T(P b F 2 , F 3 f(x, y)dxdy < 1. (71) 

The set E will be constructed in a few stages. To prove (71), we may and will assume 
that all (P, J) that contribute to T in (70) satisfy 

I P x J <£E. (72) 

Definition 3.42. Let £ T G R and Ze£ P T &e a J- dyadic square. A two dimensional 
i-tree with top data (£ T , P T ) a collection T o/ hyper-multi-tiles with the property that 
£ T £ 2a;p 4 and I P x J C P T /or eaca (P, J) := I P x J x co Pl x u P , 2 x lo Pa G T. 

A one dimensional i-tree with top data (£ T) It) is a collection T o/ multi-tiles with the 
property that £t £ 2a;p i and dp C d T /or eaca P := I P x u Pl x u P , 2 x ujp 3 G T 

Note that if (T, £t, Pt) is a two dimensional tree, then for each y, its restriction to the 
fiber above y 

{P : (P, J y ) G T} 

is a one dimensional tree with top data (£ T , d T ), that we denote by T y . We will refer to 
it as the tree induced by T. 
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We further comment on our strategy to prove (71). For each y denote 

T y (F u F 2 , F 3 )(x) :=T(F 1 ,F 2 ,F 3 )(x,y)= -^a Pl {y)a P2 {y)ap A {y)xi P {x). 

Our plan is to get estimates for f' c T*(Fi, F%, F 3 )(x)dx outside the fibered exceptional set 

y y 

Ey := E fl (R x {y}). These estimates will then be integrated over y to get (71). 

To estimate T y we will split the collection P y into trees, and will make sure that we 
gain some control over both ap t (y) and also over the counting function of the tree tops. 
The basic estimate for an i-tree T y C P y will be 

1 ^ x (^) a ■ 

' I T 11/2 1 I ap i(y) I T I ll BM Ox < ||(ap,-)p6T 1) ||BM0||(aPj)p e T 1) ||BM0||(i-p- |fw )peT y ||oo, 

p!^ b l /p l 7 f=i \ Ip \ 1 (73) 

where I ^ j G {1, 2, 3} \ {2} and 

1/2 

.X/p(^)||1/2 



(opJpgtJIbmo := sup — ap. ~ || (ap 3 ) 5 

/dyadic \ |J| PeTy . IpQI I p eTy 



\BMO x 



Note that || • ||bmo majorizes || • ||oo, so it will suffice to control the former, for each i. 

The index i — 3 plays a special role. To insure control over ||(ap 3 )peTj|BMO, we will 
need to look at ||(ap 3 )peTj / ||BMO as being the restriction (to the y fiber) of a similar two 
dimensional quantity (the 3-size). In short, to control ||(ap 3 )p e Tj|BMO, instead of selecting 
one dimensional trees in P^, we will instead select two dimensional trees in P 'hy per , and 
then restrict them to P r We explain below this procedure. 

Definition 3.43. The 3-size size 3 (P^ J/per ) of a finite collection P* hyper Q Ph yP er of hyper- 
multi-tiles is defined as 

. 1/2 

icJr hyper \ x 1 (pj)gT 

where the supremum above is taken over all 1-trees and 2-trees. 

The following Bessel type inequality is standard (see also similar results in the previous 
section). 

Lemma 3.44. Assume that for each (P, J) e P \ yver the square Ip x J intersects the 
complement of the set 28 

E 3 :={(x,y):M p3 F 3 (x,y)>l}. 
We assume as before that \\F 3 \\ P3 = 1. Then we can split 

P hyper ■ J P 

where 



m 

hyper 

m>0 



size 3 (P^J < T 



2S M p denotes the LP version of the Hardy-Littlewood maximal function 
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and P™ yper is the union of a family T m of pairwise disjoint i-trees (i e {1,2,3}) satisfying 

II XRjBMO<2 2m , (74) 



R T C {(x,y) : M P3 F 3 (x,y) > 2~ m }, (75) 
\Rt\ < 2 (2+P3)m , (76) 



and, if the tree is 1-tree or 2-tree then 

XlpXj 



II h 2 XI P x.J || < o-2m / 77 \ 

II 2^ p ' J ij x J| ^ BMQ ~ ' ^ ' 

(P,J)eT 1 p 1 
tu/wZe i/ the tree is 3-tree, then 

IK & p,j)(P,J)eT||oo < 2 _2m . 

We will state a few consequences of the above. For a.e. y 29 we let T y be the one 
dimensional tree induced by T, and denote by y the collection of these trees. 

A standard application of John-Nirenberg's inequality, together with (74) and (75) 
implies that there is E™ such that \E£\ < T Mm and 

II £ ^ T ||oo<2( 2+£ ) m . (78) 

TEf m 

Here and in the following e can be thought of as being as small as we want, while M as 
large as we want. We put E£* := [J m E™ in the exceptional set E. From this, (72) and 
(78) we get 

II £ X/tJoo £ 2(2+e)m - ( 79 ) 

Another application of John-Nirenberg's inequality combined with (77) implies that if 
T G JF m is 1-tree or 2-tree then 



E,2 XlpXj || ^ _( 2 _ e ) r 

(p,./)eT 1 ^ 1 

I P x J^E S t 



for some £ 3iT C i? T with \E 3>T \ < 2- Mm \R T \. 
An immediate consequence is that 

II ^'Jv II - II a2p 3^ V \ , II < o-(2-e)m /on\ 

|| > |TTX/p II L~(R) - II >. | r | Xlp\\L°°(R) ^ 2 . (8UJ 

— i P — i p 

(P,J)eT 1 ^ 1 -P6T H 



ye.J 



'More precisely, for y not a dyadic point 
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We put both E 3 and := \J m \J Te jr E 3jT in the exceptional set E. By (76), these 
have 0(1) measure. From this, (72) and (80) we have for each T y G y 

H a P 3 (y) ^ || < -(2-e)m / ai \ 

II 2^ I j I XJ p ||l°°(R) £ 2 ■ (°1J 

A final consequence of Lemma 3.44 that we mention is that if T G T m is a 3-tree, then 



- a P 3 \ ;| <- o-2'/' 



;PeT„||oo r5 



< 2~ 2m (82) 



We will continue to think about y as being fixed. We have so far learned how to estimate 
the third component a Ps (y), see (79), (81) and (82). 

The control of the first two components ap ; (y), % G {1,2} is completely standard. We 
will have a purely one dimensional selection algorithm for trees, in particular we will not 
use two dimensional trees. 

Lemma 3.45. Let i G {1,2}. Assume that for each (P, J) G Phyper, IpX J intersects the 
complement of the set 

E t :={(x,y):M P Mx,y)>l}- 
We also assume as before that \\Fi\\ Pi = 1. Then we can split 

Py=[J P 7 

m>0 

in such a way that P™ is the union of a family T l my of pairwise disjoint trees satisfying 

II ^2 X/tIIbmo, < 2 2m , (83) 

I T c{x:M p3 , x F 3 (x,y)>2- m }, (84) 
£ | J T | < 2^ m J M^x, y)dx, (85) 
and, i/ £/ie iree zs j-tree with j ^ i then 



E4,,|fjllBMO,<2- 2m , (86) 
PeT y 1 p| 



■w/w/e if the tree is i-tree, then 



,2 



" i,<y )p^ v U<2- 2m . (87) 



We now put all the pieces together. Put E\ and E 2 in E. Let m := (mi,m 2 ,m 3 ) and 
assume that each m 8 > 0. Denote by |m| := mi + m2 + 777.3. Let the collection of 
trees obtained by intersecting triples of trees, one from each T % m . y . For such a tree T y we 
get by using (73), (81), (82), (86) and (87), and by invoking John-Nirenberg again, 

11 E ifVn^ ( ^¥r lu - 2 " H(1 " e) ' (88) 

PeTy ' p ' i=i ' p ' 
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To get the above, we actually assume that for each T y we have eliminated an exceptional 
set Et of measure 0(2~ M ' m ')|/T y |- More precisely, we add to E the two dimensional 
exceptional set containing all E Ty x {y}, for all y. It is easy to see, due to (85), that the 
union of these sets has measure 0(1). 

We can now evaluate the BMO norm of the operator associated with the forest F^y, 
defined by 



! 1 p | 



We have 



\rp II < -|m|(l-e)|| || 

M^JIbmo^ 'II 2^ Xit v \\bmo x 



< 2 -H( 1 -)min|| X/tJhmo. 

< 2 -l m l( 1 -2^2^^r (89) 

where the last inequality follows from (79) and (83). 

We note that due to (75) and (84), T Tfhy is supported in each of the sets 

{x:M Pi F i (x : y)>2- m <} : 

It follows that the size of the support of T-r^ is 

■L -L J m,y 

3 

<H(2^ [M P M^y)Tdx)^. 

i=l J 

Finally, by invoking this, (89) and the initial assumption that ^ < 2 we get for sufficiently 
large t 

3 

\\T^Jt< H^IbmoII^ [M P Mx,y)] Pi dx)% 

i=i ^ 

< 2 £^(f+fH-H) | [M p .Fj(x, y)] p< c£r) 



V 



<2-^l[\\M P M^y)\\ L 



This estimate is summable over all m with positive entries. Using this and the fact that 

p» = U U 

we get 

3 

/ Ty(F ± , F 2 , F 3 )(x)dx < J] HM^F^y)!^. 
^ i=i 
Integration in x and Holder's inequality gives (71). 
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4. The Case 2 and 3 



We analyze the case B 



1 




, that is 



A(Fi,F 2 ,F 3 )= / F 1 (x + t,y + s)F 2 (x + s,y)F 3 (x,y)K(t,s)dxdydtds. 

We give an outline of the proof of bounds in the same range as that in Theorem 3.1. We 
first do a cone decomposition as in (2), and analyze expressions like 

V / F 1 (x + t,y + s)F 2 (x + s,y)F 3 (x,y)y k (t)<f> k (s)dxdydtds. (90) 
kez J * 4 

As before, we distinguish two cases. 

4.1. The cone / $ = 0. 

By using standard reductions, in order to get bounds for (90) it suffices to prove the 
boundedness of the model sum 

r 3 

2 E H^Mx^dxdy (91) 

where for i e {2,3}, n$ denotes some projection operator (acting on the x variable) 
associated with adapted to uj, while tt^ is the tensor product of a projection as above 
in the second coordinate and a projection as above on [— |u;|, \ou\], in the first coordinate. 
The relationship in this case between the ooi of a given scale 2 k is represented by the 
relations — u 2 = uj 3 = lo\ + Co2 h . While the fact that — u 2 = oj 3 is of no particular 
importance 30 , the only genuine source of orthogonality here comes from the fact that 
oo 3 = ui + C 2 k . 

We will have two types of trees: the 23-trees, those for which 31 u^ Tt R T C 0)3, and 1-trees, 
those for which oo^ t ,r t C u>i. As before, the tree will be called i- overlapping if £ T e 2cj« 
and i-lacunary otherwise. The observation that any tree must have at least one lacunary 
index is exploited to prove the paraproduct estimate (the analog of Proposition 3.17). We 
then work with one and a half dimensional phase-space projections on the second and 
third function, and with two dimensional projections on the first function. The selection 
algorithms and the Bessel type inequalities needed to control forests is essentially the same 
as in Section 3.1.5 and Section 3.1.6. As a general observation, we note that, as in the 
previous case, neither the fact that —u 2 = u 3 nor the separation condition U3 = uj\ +Co2 k 
play any significant role in the selection algorithm and in establishing Bessel's inequality 32 . 

We omit the other details, and invite the interested reader to take this as an exercise, 
after reading Section 3. 



30 w 2 = w 3 would have made no difference 

31 we follow here the same notation as in Definition 3.6 

32 An alternative condition like, say, u> 1 = uj 2 = ^3 would have made no difference, in that part of the 
argument 
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4.2. The cone / * = 0. 

The same standard reductions make it sufficient to prove the boundedness of the model 
sum 

r 3 

2 E U^F i (x,y)dxdy (92) 

where for i e {2,3}, 7r^ denotes some projection operator (acting on the x variable) 
associated with adapted to w, while tt^ is the tensor product of a projection as above 
in the second coordinate and a projection as above on 2\u\], in the first coordinate. 
The relationship in this case between the Ui of a given scale 2 k is represented by the 
relations — cu 1 = u 2 = —ou 3 + C 2 k . We will now have 12-trees and 3-trees, and again, 
there should be at least one lacunary index. Moreover, the fact that nffl projects to 
[|u;|,2|u;|] in the first coordinate is yet another source of orthogonality. It is easy to 
see that for either type of tree, the paraproduct estimate follows directly by Holder's 
inequality (apply square functions on two of the components, one of which is always iq, 
and a maximal function on the remaining component). We again omit the details. 

5. The non-degenerate case 

In this section we briefly show how to analyze the case {0, 1} D Spec(B) = 0. 

For a square Q, denote with c(Q) := (ci(Q) , c 2 (Q)) its center. Let tp be a function 
whose Fourier transform is adapted to [—1/2, 1/2] x [—1/2, 1/2]. With each dyadic box 
P := Rp x uj Pi x ujp 2 x wp 3 with R Pl uj Pi C R 2 , and such that Rp has area \Rp\ equal 
to l^pj -1 = |^p 2 | -1 = |^p 3 | _1 , w e associate three wave-packets ipp { , localized (in time- 
frequency) in the tiles Rp x ojp i 

VP > [X ' y) ~ \Rp\V^\ |i? P |V2 ' |iJ p |l/2 J 

We perform a wave-packet decomposition of each Fi (as in Section 3.2.1), and a cone 
decomposition of K, and then input these in A. Elementary computations show that, due 
to the fact that {0, 1} fl Spec(B) = 0, all cones are equivalent. Here is what we mean. 
These computations show that only a few types of P will produce a non-zero contribution 
to A. A somewhat simplified way of writing the restrictions on a contributing P with 
scale \Rp\ = 2~ 2k is expressed by the following system of equations: 

f Ci(u Pl ) + ci(ujp 2 ) + Ci(cjp 3 ) = 

c 2 {u Pl ) + c 2 {ujp 2 ) + c 2 (ujp 3 ) = 

ci{u Pl ) + &nci(wp 2 ) + b 2 ic 2 (ujp 2 ) = Ci2 k 

c 2 (uj Pi ) + b 12 d(ujp 2 ) + b 22 c 2 (ujp 2 ) = C 2 2 k 

with max{Ci, C 2 } > 100. Here bij are the entries of B. It is easy to see that the condition 
{0, 1} fl Spec(B) = implies that the family of contributing triples (up 1 ,up 2 ,ivp 3 ) is one- 
parameter, in that if we specify c(u;pj for some % and specify the scale, then the above 
system has a unique solution. Moreover, the condition max{Ci,C2} > 100 implies that 
i- trees will always be j- lacunary, for each j ^ i. The approach then follows closely the 
lines of the proof of the boundedness of the one dimensional Bilinear Hilbert Transform, 
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with no significant modifications (see [10], [11], [16] for details). The outcome is bounds 
for the operator in the same range as that in Theorem 1.1. 

6. Applications to Ergodic theory 

A famous open problem in Ergodic theory concerns the pointwise convergence of the 
bilinear averages for commuting transformations: 

Question 6.1. Let (X, E, m) be a probability space and let T, S : X — > X be two commut- 
ing measurable m-preserving point transformations on X. Then for each f,g G L°°(X), 
the following averages converge for almost every x G X 

1 N 

— ^f(T n x)g(S n x) (93) 

71=1 

This difficult question is known to have a positive answer when S is a power of T. 
This was proved by Bourgain in [3], and then reproved 33 by the first author in [7]. The 
techniques we develop in this paper do not seem sufficient by themselves to address the 
question above in full generality, but we believe they represent an important step towards 
its resolution. Another step in this program would be to prove bounds for the operator 
described in Case 6, and ultimately for the bilinear averages 

(lit 

F 1 (x + t,y)F 2 (x,y + t)-. 



I 



We mention however the following related consequences, that come as a by-product of 
our analysis. 

Theorem 6.2. Under the hypothesis above, the following averages converge for almost 
every x G X 

1 N N 

Jp E E f(T n S m x)g(T- n S m x) (94) 

n=l m=l 

N N 

E E f(T n S m x)g(T m x) (95) 

n=l m=l 

More generally, we can consider the most general problem of this type, that of the 
convergence of the averages 

1 N N 

f(rph(n,m) S h(n,m) x j g ( T l 3 {n,m) S U(n,m) ^ 

n=l m=l 

where U are linear forms in n and m. By doing a case analysis (that we omit) it turns 
out that all these averages are provable to converge, except for the ones mentioned in 
the beginning of the section (i.e. li(n,m) = an, U(n, m) = bn, l 2 = I3 = 0, and their 
equivalent versions). This follows either by applying time-frequency methods like in the 



33 In [7], a unified approach is used to prove convergence of both averages and their singular series 
counterpart 
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case of the averages in Theorem 6.2, or by some trivial manipulations that reduce them 
to more familiar objects. An example of the latter kind is represented by the averages 

N N 
n=l m=l 

While a harmonic analytic approach for them seems unavailable at the moment 34 , these 
averages are easily seen to separate in n and to, and their convergence is immediate by 
the Pointwise Ergodic theorem. 

The convergence results in Theorem 6.2 are consequences of appropriate oscillation 
inequalities, as explained below. By using standard transference arguments (see for ex- 
ample [2]), one can easily show that the convergence is preserved if the probability space 
(X, E, to) is replaced with a sigma finite measure space. 

The first part of Theorem 6.2 implies the result from [3], as can easily be seen by 
choosing S to be the identity transformation. On the other hand, the averages in (95) are 
of a slightly different nature. While their convergence does not imply the result in [3], it 
nevertheless implies another important result in Ergodic theory, namely the convergence 
of the Wiener- Wintner averages. In an equivalent formulation, this result asserts the 
following: Given any dynamical system (Y, J 7 , /i, R), any F G L°°(Y) and any measurable 
function N : Y — > [0, 1), the averages 

1 N 

-J2F(R n y)e^ n 

n=l 

converge for almost every y G Y. See [1] for an extensive discussion about the Wiener- 
Wintner property, and [6] and [9] for extensions of this result to series. The above impli- 
cation is easily seen by choosing the sigma finite space to be X := Y x R equipped with 
the product measure 35 m := /x x m £ , then choosing T(y, 9) = (y, 9 + 1), S(y, 9) = (Ry, 9) 
and f(y,9) = F(y), g(y, 9) = e m ^ 9 . 

We now say a few words about the proof of Theorem 6.2. The argument follows the 
same lines as that in [7], with the extra infusion of techniques developed in this paper. 
We briefly touch the main points, and leave the details to the interested reader. Let us 
focus on (94). Standard transfer between X and R 2 using Z 2 as a mediator 36 shows that 
it suffices to prove an oscillation inequality for 



/ F 1 (x + t,y + s)F 2 (x -t,y + s)V k (t)$ k (s)dtds. 

JR? 



We indicate more precisely what this means. Fix an integer J and a finite sequence of 
integers U := u\ < u 2 < ■■■ < uj. We restrict attention to the cone in Section 3.1, so we 
will use the notation in there. 



34 These averages are connected to the singular integral operators described in Case 6 
35 77i£ denotes the Lebesgue measure 

36 The transfer from R 2 to Z 2 is done by using functions constant on all the lattice squares of sidelength 
1. The transfer from Z 2 to X is then mediated by functions living on x-orbits, that is functions of the 
form F(n,m) = f(T n S m x) 
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Theorem 6.3. For each 2 < Pi,P2,P3 < oo satisfying ^- + ^ + ^ = 1, we have 

J=l Uj <£<^ +1 Q="x^ 6 Q (96) 

2 fe <|^|<2 11 J + 1 

where is a multiplier addapted to u x [|u)|, 2 |u> | ] . Moreover, the implicit constant is 
independent of J and U. 

The important thing in the oscillation inequality above is that the exponent of J is 
strictly smaller than 1/2. See [7] for more details. 

Consider an arbitrary sequence of functions h±, h 2 , ■ ■ ■ , : R 2 — > C satisfying 

EN 2 - 1 ' 

J'=l 

and also an arbitrary function F 3 6 L P3 (R 2 ). We denote by j(c^) the unique number in 
{1, 2, . . . , J - 1} such that 2 u n«) < \u\ < 2 U ^>+ 1 and by F 3j£J := F 3 h j{ui) . We consider the 
stopping times Kj : R 2 — > {uj, Uj + 1, . . . , — 1}, for each 1 < j < J — 1. Using these, 
(96) is equivalent to proving that 

E / Xi*pjp(z,y)4 1)F i4 2)F 24 3 ^ -))(^?/) 

PeP ^ R2 

< J 1/4 ||Fi||p 1 ||F2||p 2 ||F 3 ||p3, 

where tt^ is the projection associated with m Q . The only difference between this and (7) 
is the fact that the third function incorporates an extra truncation and an extra block 
localization. We will have exactly the same kind of trees and sizes for i e {1,2} as in 
section Section 3.1, the only difference being the 3-size, which will have to incorporate 
these two new ingredients. We define instead the 3-size by 



size 3 (T) := ( — ^- E SU P \\XR P (x,y)T mp (x 2 ^ 



p)^' ) <\co P \<2 U ^+ lF3 ^'>'^( X ' y ^Ll 



» 1/2 

2 



where mp ranges over all functions adapted to ujp x [|u)p|,2|u>p|]. 

The phase-space projections in the case i G {1, 2}, and all the estimates in Proposition 
3.37 are the same. The only difference is in how we define the phase-space projection of 
F 3 . We define 

nsOFj) := E miXG.Fsj)^^), 
jeJ T 

where F 3J = F 3>u> and Gj = {(x,y) : 2 K ^ X ^ < V} if \u\ = 2> . 

We then use Proposition 3.17 and Proposition 3.37 in the same way as before to get 
Proposition 3.16. Two things remain to be proved in order to conclude the proof of 
Theorem 6.3: a bound for size 3 like the one in Lemma 3.21, and a Bessel type inequality 
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like the one in Proposition 3.19. The first estimate follows by writing 

|2 



■))(x,y)\\] 

T| peT 

1 /• J_1 

~J]^J Xr t (x,v)J2 Yl Yl \T mp (x 2 ^o.)^ 2k F 3 (-,-)h j (-,-))(x,y)\ 2 dxdy 



|i?T | 

l 



i=i Mj<fc<«j+i-i |w P |=2 fc 
J-i 

Xrt(x, y) \ F 3( X , y) h j( x , y)?dxdy 

3=1 

X 8 R T (x,y)F^(x,y)dxdy, 



with the penultimate inequality following from the orthogonality of the T mp for distinct 
scales, duality and the boundedness of the maximal truncations of two dimensional sin- 
gular integral operators. 

On the other hand, the needed Bessel type inequality was proved in Proposition 5.10. 
in [7]. That is a one dimensional result, but, as explained before, the extension to our 
two dimensional context requires no serious modifications. 
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